PyPI - mlarray - Versions diffs - 0.0.52__tar.gz → 0.0.53__tar.gz - Mend

mlarray 0.0.52tar.gz → 0.0.53tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (58) hide show

{mlarray-0.0.52 → mlarray-0.0.53}/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.4
 Name: mlarray
-Version: 0.0.52
+Version: 0.0.53
 Summary: Array format specialized for Machine Learning with Blosc2 backend and standardized metadata.
 Author-email: Karol Gotkowski <karol.gotkowski@dkfz.de>
 License: MIT
@@ -236,18 +236,10 @@ mlarray_header sample.mla
 ### mlarray_convert
-Convert between MLArray and NIfTI/NRRD files.
-When converting from NIfTI/NRRD to MLArray, source metadata is copied into
-`meta.source`.
-When converting from MLArray to NIfTI/NRRD, only `meta.source` is copied into
-the output header. Spatial metadata (`spacing`, `origin`, `direction`) is set
-explicitly from `meta.spatial`.
+Convert a NIfTI or NRRD file to MLArray and copy metadata.
 ```bash
 mlarray_convert sample.nii.gz output.mla
-mlarray_convert sample.mla output.nii.gz
 ```
 ## Contributing

{mlarray-0.0.52 → mlarray-0.0.53}/README.md RENAMED Viewed

@@ -202,18 +202,10 @@ mlarray_header sample.mla
 ### mlarray_convert
-Convert between MLArray and NIfTI/NRRD files.
-When converting from NIfTI/NRRD to MLArray, source metadata is copied into
-`meta.source`.
-When converting from MLArray to NIfTI/NRRD, only `meta.source` is copied into
-the output header. Spatial metadata (`spacing`, `origin`, `direction`) is set
-explicitly from `meta.spatial`.
+Convert a NIfTI or NRRD file to MLArray and copy metadata.
 ```bash
 mlarray_convert sample.nii.gz output.mla
-mlarray_convert sample.mla output.nii.gz
 ```
 ## Contributing

mlarray-0.0.53/docs/cli.md ADDED Viewed

@@ -0,0 +1,29 @@
+# CLI
+MLArray includes a small command-line interface for common tasks such as **inspecting file headers** and **converting existing image formats** into MLArray. This is especially useful when you want to quickly verify metadata, debug a dataset, or batch-convert files without writing Python code.
+The CLI currently focuses on core workflows (header inspection and conversion). Support for converting a wider range of image formats will be added over time.
+---
+## `mlarray_header`
+Print the metadata header from a `.mla` file.
+This command is useful for quickly checking spatial metadata, stored schemas, and other file-level information without loading the full array into memory.
+```bash
+mlarray_header sample.mla
+```
+---
+## `mlarray_convert`
+Convert a NIfTI or NRRD file to MLArray and copy metadata.
+This provides an easy way to bring existing medical imaging data into an MLArray-based workflow while preserving the original metadata for downstream analysis and visualization.
+```bash
+mlarray_convert sample.nii.gz output.mla
+```

mlarray-0.0.53/mlarray/cli.py ADDED Viewed

@@ -0,0 +1,60 @@
+import argparse
+import json
+from typing import Union
+from pathlib import Path
+from mlarray import MLArray
+from mlarray.meta import _meta_internal_write
+try:
+    from medvol import MedVol
+except ImportError:
+    MedVol = None
+def print_header(filepath: Union[str, Path]) -> None:
+    """Print the MLArray metadata header for a file.
+    Args:
+        filepath: Path to a ".mla" file.
+    """
+    meta = MLArray(filepath).meta
+    if meta is None:
+        print("null")
+        return
+    print(json.dumps(meta.to_plain(include_none=True), indent=2, sort_keys=True))
+def convert_to_mlarray(load_filepath: Union[str, Path], save_filepath: Union[str, Path]):
+    if MedVol is None:
+        raise RuntimeError("medvol is required for mlarray_convert; install with 'pip install mlarray[all]'.")
+    image_meta_format = None
+    if str(load_filepath).endswith(f".nii.gz") or str(load_filepath).endswith(f".nii"):
+        image_meta_format = "nifti"
+    elif str(load_filepath).endswith(f".nrrd"):
+        image_meta_format = "nrrd"
+    image_medvol = MedVol(load_filepath)
+    image_mlarray = MLArray(image_medvol.array, spacing=image_medvol.spacing, origin=image_medvol.origin, direction=image_medvol.direction, meta=image_medvol.header)
+    with _meta_internal_write():
+        image_mlarray.meta._image_meta_format = image_meta_format
+    image_mlarray.save(save_filepath)
+def cli_print_header() -> None:
+    parser = argparse.ArgumentParser(
+        prog="mlarray_header",
+        description="Print the MLArray metadata header for a file.",
+    )
+    parser.add_argument("filepath", help="Path to a .mla file.")
+    args = parser.parse_args()
+    print_header(args.filepath)
+def cli_convert_to_mlarray() -> None:
+    parser = argparse.ArgumentParser(
+        prog="mlarray_convert",
+        description="Convert a NiFTi or NRRD file to MLArray and copy all metadata.",
+    )
+    parser.add_argument("load_filepath", help="Path to the NiFTi (.nii.gz, .nii) or NRRD (.nrrd) file to load.")
+    parser.add_argument("save_filepath", help="Path to the MLArray (.mla) file to save.")
+    args = parser.parse_args()
+    convert_to_mlarray(args.load_filepath, args.save_filepath)

{mlarray-0.0.52 → mlarray-0.0.53}/mlarray/mlarray.py RENAMED Viewed

@@ -12,9 +12,6 @@ from mlarray.meta import (
     _spatial_axis_mask,
     _meta_internal_write,
 )
-from mlarray.blosc2_layout_strategies import (
-    comp_blosc2_params_spatial_only_magnitude,
-)
 from mlarray.utils import is_serializable
 import pickle
 import gzip
@@ -1327,8 +1324,8 @@ class MLArray:
     @classmethod
     def comp_blosc2_params(
             cls,
-            image_size: Tuple[int, ...],
-            patch_size: Tuple[int, ...],
+            image_size: Union[Tuple[int, int], Tuple[int, int, int], Tuple[int, int, int, int]],
+            patch_size: Union[Tuple[int, int], Tuple[int, int, int]],
             spatial_axis_mask: Optional[list[bool]] = None,
             bytes_per_pixel: int = 4,  # 4 byte are float32
             l1_cache_size_per_core_in_bytes: int = 32768,  # 1 Kibibyte (KiB) = 2^10 Byte;  32 KiB = 32768 Byte
@@ -1336,35 +1333,31 @@ class MLArray:
             safety_factor: float = 0.8  # we dont will the caches to the brim. 0.8 means we target 80% of the caches
         ):
         """
-        Compute recommended Blosc2 chunk and block sizes from a patch-size hint.
-        This method uses the ``comp_blosc2_params_spatial_only_magnitude``
-        strategy from :mod:`mlarray.blosc2_layout_strategies`.
-        Strategy summary:
-            1. Split axes into spatial and non-spatial using
-               ``spatial_axis_mask``.
-            2. Keep non-spatial axes at ``1`` in both blocks and chunks so the
-               layout is driven by spatial patch sampling instead of stretching
-               cache budgets across non-spatial dimensions.
-            3. Grow block sizes along spatial axes under the L1 cache budget.
-               Growth is weighted by the relative magnitude of the requested
-               patch size, so larger patch axes are allowed to grow faster.
-            4. Grow chunk sizes in multiples of the block sizes under the L3
-               cache budget, again weighted by patch-size magnitude.
-            5. Enforce structural constraints that keep the layout regular:
-               non-clipped spatial axes stay even, and non-clipped chunk axes
-               remain multiples of their corresponding block axes.
-        This strategy supports arbitrary numbers of spatial and non-spatial axes
-        as long as the patch size dimensionality matches the number of spatial axes.
+        Computes a recommended block and chunk size for saving arrays with Blosc v2.
+        Blosc2 NDIM documentation:
+        "Having a second partition allows for greater flexibility in fitting different partitions to different CPU cache levels.
+        Typically, the first partition (also known as chunks) should be sized to fit within the L3 cache,
+        while the second partition (also known as blocks) should be sized to fit within the L2 or L1 caches,
+        depending on whether the priority is compression ratio or speed."
+        (Source: https://www.blosc.org/posts/blosc2-ndim-intro/)
+        Our approach is not fully optimized for this yet.
+        Currently, we aim to fit the uncompressed block within the L1 cache, accepting that it might occasionally spill over into L2, which we consider acceptable.
+        Note: This configuration is specifically optimized for nnU-Net data loading, where each read operation is performed by a single core, so multi-threading is not an option.
+        The default cache values are based on an older Intel 4110 CPU with 32KB L1, 128KB L2, and 1408KB L3 cache per core.
+        We haven't further optimized for modern CPUs with larger caches, as our data must still be compatible with the older systems.
         Args:
-            image_size (Tuple[int, ...]): Full array shape.
-            patch_size (Tuple[int, ...]): Patch size over spatial axes only.
-            spatial_axis_mask (Optional[list[bool]]): Mask indicating for every
-                array axis whether it is spatial. If omitted, all axes are
-                treated as spatial.
+            image_size (Union[Tuple[int, int], Tuple[int, int, int], Tuple[int, int, int, int]]):
+                Image shape. Use a 2D, 3D, or 4D size; 2D/3D inputs are
+                internally expanded to 4D (with non-spatial axes first).
+            patch_size (Union[Tuple[int, int], Tuple[int, int, int]]): Patch
+                size for spatial dimensions. Use a 2-tuple (x, y) or 3-tuple
+                (x, y, z).
+            spatial_axis_mask (Optional[list[bool]]): Mask indicating for every axis whether it is spatial or not.
             bytes_per_pixel (int): Number of bytes per element. Defaults to 4
                 for float32.
             l1_cache_size_per_core_in_bytes (int): L1 cache per core in bytes.
@@ -1374,15 +1367,93 @@ class MLArray:
         Returns:
             Tuple[List[int], List[int]]: Recommended chunk size and block size.
         """
-        return comp_blosc2_params_spatial_only_magnitude(
-            image_size=tuple(int(v) for v in image_size),
-            patch_size=tuple(int(v) for v in patch_size),
-            spatial_axis_mask=spatial_axis_mask,
-            bytes_per_pixel=bytes_per_pixel,
-            l1_cache_size_per_core_in_bytes=l1_cache_size_per_core_in_bytes,
-            l3_cache_size_per_core_in_bytes=l3_cache_size_per_core_in_bytes,
-            safety_factor=safety_factor,
-        )
+        def _move_index_list(a, src, dst):
+            a = list(a)
+            x = a.pop(src)
+            a.insert(dst, x)
+            return a
+        num_squeezes = 0
+        if len(image_size) == 2:
+            image_size = (1, 1, *image_size)
+            num_squeezes = 2
+        elif len(image_size) == 3:
+            image_size = (1, *image_size)
+            num_squeezes = 1
+        non_spatial_axis = None
+        if spatial_axis_mask is not None:
+            non_spatial_axis_mask = [not b for b in spatial_axis_mask]
+            if sum(non_spatial_axis_mask) > 1:
+                raise RuntimeError("Automatic blosc2 optimization currently only supports one non-spatial axis. Please set chunk and block size manually.")
+            non_spatial_axis = next((i for i, v in enumerate(non_spatial_axis_mask) if v), None)
+            if non_spatial_axis is not None:
+                image_size = _move_index_list(image_size, non_spatial_axis+num_squeezes, 0)
+        if len(image_size) != 4:
+            raise RuntimeError("Image size must be 4D.")
+        if not (len(patch_size) == 2 or len(patch_size) == 3):
+            raise RuntimeError("Patch size must be 2D or 3D.")
+        non_spatial_size = image_size[0]
+        if len(patch_size) == 2:
+            patch_size = [1, *patch_size]
+        patch_size = np.array(patch_size)
+        block_size = np.array((non_spatial_size, *[2 ** (max(0, math.ceil(math.log2(i)))) for i in patch_size]))
+        # shrink the block size until it fits in L1
+        estimated_nbytes_block = np.prod(block_size) * bytes_per_pixel
+        while estimated_nbytes_block > (l1_cache_size_per_core_in_bytes * safety_factor):
+            # pick largest deviation from patch_size that is not 1
+            axis_order = np.argsort(block_size[1:] / patch_size)[::-1]
+            idx = 0
+            picked_axis = axis_order[idx]
+            while block_size[picked_axis + 1] == 1 or block_size[picked_axis + 1] == 1:
+                idx += 1
+                picked_axis = axis_order[idx]
+            # now reduce that axis to the next lowest power of 2
+            block_size[picked_axis + 1] = 2 ** (max(0, math.floor(math.log2(block_size[picked_axis + 1] - 1))))
+            block_size[picked_axis + 1] = min(block_size[picked_axis + 1], image_size[picked_axis + 1])
+            estimated_nbytes_block = np.prod(block_size) * bytes_per_pixel
+        block_size = np.array([min(i, j) for i, j in zip(image_size, block_size)])
+        # note: there is no use extending the chunk size to 3d when we have a 2d patch size! This would unnecessarily
+        # load data into L3
+        # now tile the blocks into chunks until we hit image_size or the l3 cache per core limit
+        chunk_size = deepcopy(block_size)
+        estimated_nbytes_chunk = np.prod(chunk_size) * bytes_per_pixel
+        while estimated_nbytes_chunk < (l3_cache_size_per_core_in_bytes * safety_factor):
+            if patch_size[0] == 1 and all([i == j for i, j in zip(chunk_size[2:], image_size[2:])]):
+                break
+            if all([i == j for i, j in zip(chunk_size, image_size)]):
+                break
+            # find axis that deviates from block_size the most
+            axis_order = np.argsort(chunk_size[1:] / block_size[1:])
+            idx = 0
+            picked_axis = axis_order[idx]
+            while chunk_size[picked_axis + 1] == image_size[picked_axis + 1] or patch_size[picked_axis] == 1:
+                idx += 1
+                picked_axis = axis_order[idx]
+            chunk_size[picked_axis + 1] += block_size[picked_axis + 1]
+            chunk_size[picked_axis + 1] = min(chunk_size[picked_axis + 1], image_size[picked_axis + 1])
+            estimated_nbytes_chunk = np.prod(chunk_size) * bytes_per_pixel
+            if np.mean([i / j for i, j in zip(chunk_size[1:], patch_size)]) > 1.5:
+                # chunk size should not exceed patch size * 1.5 on average
+                chunk_size[picked_axis + 1] -= block_size[picked_axis + 1]
+                break
+        # better safe than sorry
+        chunk_size = [min(i, j) for i, j in zip(image_size, chunk_size)]
+        if non_spatial_axis is not None:
+            block_size = _move_index_list(block_size, 0, non_spatial_axis+num_squeezes)
+            chunk_size = _move_index_list(chunk_size, 0, non_spatial_axis+num_squeezes)
+        block_size = block_size[num_squeezes:]
+        chunk_size = chunk_size[num_squeezes:]
+        return [int(value) for value in chunk_size], [int(value) for value in block_size]
     def _open(
             self,
@@ -1814,6 +1885,9 @@ class MLArray:
             MetaBlosc2: Validated Blosc2 metadata instance.
         """
         num_spatial_axes = sum(spatial_axis_mask)
+        num_non_spatial_axes = sum([not b for b in spatial_axis_mask])
+        if patch_size is not None and patch_size != "default" and (num_spatial_axes == 1 or num_spatial_axes > 3 or num_non_spatial_axes > 1):
+            raise NotImplementedError("Chunk and block size optimization based on patch size is only implemented for 2D and 3D spatial images with at most one further non-spatial axis. Please set the chunk and block size manually or set to None for blosc2 to determine a chunk and block size.")
         if patch_size is not None and patch_size != "default" and (chunk_size is not None or block_size is not None):
             raise RuntimeError("patch_size and chunk_size / block_size cannot both be explicitly set.")
         if (chunk_size is not None and block_size is None) or (chunk_size is None and block_size is not None):
@@ -1830,14 +1904,7 @@ class MLArray:
         if chunk_size is not None or block_size is not None:
             patch_size = None
-        patch_size = [patch_size] * num_spatial_axes if isinstance(patch_size, int) else patch_size
-        if patch_size is not None and num_spatial_axes == 0:
-            raise RuntimeError(
-                "Automatic patch-size optimization requires at least one spatial axis. "
-                "Set patch_size=None and provide chunk_size/block_size manually, "
-                "or let Blosc2 determine the layout."
-            )
+        patch_size = [patch_size] * len(shape) if isinstance(patch_size, int) else patch_size
         if patch_size is not None:
             chunk_size, block_size = MLArray.comp_blosc2_params(shape, patch_size, spatial_axis_mask, bytes_per_pixel=dtype_itemsize)

{mlarray-0.0.52 → mlarray-0.0.53}/mlarray.egg-info/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.4
 Name: mlarray
-Version: 0.0.52
+Version: 0.0.53
 Summary: Array format specialized for Machine Learning with Blosc2 backend and standardized metadata.
 Author-email: Karol Gotkowski <karol.gotkowski@dkfz.de>
 License: MIT
@@ -236,18 +236,10 @@ mlarray_header sample.mla
 ### mlarray_convert
-Convert between MLArray and NIfTI/NRRD files.
-When converting from NIfTI/NRRD to MLArray, source metadata is copied into
-`meta.source`.
-When converting from MLArray to NIfTI/NRRD, only `meta.source` is copied into
-the output header. Spatial metadata (`spacing`, `origin`, `direction`) is set
-explicitly from `meta.spatial`.
+Convert a NIfTI or NRRD file to MLArray and copy metadata.
 ```bash
 mlarray_convert sample.nii.gz output.mla
-mlarray_convert sample.mla output.nii.gz
 ```
 ## Contributing

{mlarray-0.0.52 → mlarray-0.0.53}/mlarray.egg-info/SOURCES.txt RENAMED Viewed

@@ -5,7 +5,6 @@ README.md
 mkdocs.yml
 pyproject.toml
 ./mlarray/__init__.py
-./mlarray/blosc2_layout_strategies.py
 ./mlarray/cli.py
 ./mlarray/meta.py
 ./mlarray/mlarray.py
@@ -13,11 +12,6 @@ pyproject.toml
 .github/workflows/workflow.yml
 assets/banner.png
 assets/banner.png~
-bench/.gitignore
-bench/README.md
-bench/bench_convert_nii_to_mla_random_read.py
-bench/bench_io_blosc2_layouts.py
-bench/helper/print_mla_layouts.py
 docs/api.md
 docs/cli.md
 docs/index.md
@@ -36,7 +30,6 @@ examples/example_non_spatial.py
 examples/example_open.py
 examples/example_save_load.py
 mlarray/__init__.py
-mlarray/blosc2_layout_strategies.py
 mlarray/cli.py
 mlarray/meta.py
 mlarray/mlarray.py
@@ -49,7 +42,6 @@ mlarray.egg-info/requires.txt
 mlarray.egg-info/top_level.txt
 tests/test_asarray.py
 tests/test_bboxes.py
-tests/test_cli.py
 tests/test_compress_decompress.py
 tests/test_constructors.py
 tests/test_create.py

{mlarray-0.0.52 → mlarray-0.0.53}/tests/test_optimization.py RENAMED Viewed

@@ -5,7 +5,6 @@ from pathlib import Path
 import numpy as np
 from mlarray import MLArray, MLARRAY_DEFAULT_PATCH_SIZE
-from mlarray.meta import MetaSpatial
 def _make_array(shape=(16, 32, 32), seed=0, dtype=np.float32):
@@ -112,43 +111,6 @@ class TestOptimizationExamples(unittest.TestCase):
             self.assertIsNotNone(loaded.meta.blosc2.chunk_size)
             self.assertIsNotNone(loaded.meta.blosc2.block_size)
-    def test_patch_optimization_supports_multiple_non_spatial_axes(self):
-        with tempfile.TemporaryDirectory() as tmpdir:
-            array = _make_array(shape=(2, 3, 16, 32, 32))
-            path = Path(tmpdir) / "multi-non-spatial.mla"
-            axis_labels = [
-                MetaSpatial.AxisLabel.channel,
-                MetaSpatial.AxisLabel.temporal,
-                MetaSpatial.AxisLabel.spatial_z,
-                MetaSpatial.AxisLabel.spatial_y,
-                MetaSpatial.AxisLabel.spatial_x,
-            ]
-            MLArray(array, axis_labels=axis_labels, patch_size=8).save(path)
-            loaded = MLArray(path)
-            self.assertEqual(loaded.meta.blosc2.patch_size, [8, 8, 8])
-            self.assertEqual(len(loaded.meta.blosc2.chunk_size), 5)
-            self.assertEqual(len(loaded.meta.blosc2.block_size), 5)
-            self.assertEqual(loaded.meta.blosc2.chunk_size[:2], [1, 1])
-            self.assertEqual(loaded.meta.blosc2.block_size[:2], [1, 1])
-    def test_patch_optimization_supports_more_than_three_spatial_axes(self):
-        array = _make_array(shape=(2, 6, 8, 10, 12))
-        axis_labels = [
-            MetaSpatial.AxisLabel.channel,
-            MetaSpatial.AxisLabel.spatial,
-            MetaSpatial.AxisLabel.spatial,
-            MetaSpatial.AxisLabel.spatial,
-            MetaSpatial.AxisLabel.spatial,
-        ]
-        image = MLArray(array, axis_labels=axis_labels, patch_size=(2, 4, 4, 6))
-        self.assertEqual(image.meta.blosc2.patch_size, [2, 4, 4, 6])
-        self.assertEqual(len(image.meta.blosc2.chunk_size), 5)
-        self.assertEqual(len(image.meta.blosc2.block_size), 5)
 if __name__ == "__main__":
     unittest.main()

mlarray-0.0.52/bench/.gitignore DELETED Viewed

	@@ -1,2 +0,0 @@
1	- data/
2	- results/

mlarray-0.0.52/bench/README.md DELETED Viewed

@@ -1,56 +0,0 @@
-# Benchmark Scripts
-This folder contains benchmarking scripts for MLArray IO/layout experiments.
-## `bench_io_blosc2_layouts.py`
-Benchmarks IO throughput across:
-- layout method(s) based on `comp_blosc2_params` (currently baseline copy only)
-- image size tiers (`small`, `medium`, `large`, `very_large`)
-- 2D / 3D / 4D-total array cases with spatial and optional non-spatial axis
-- multiple patch sizes (2D and 3D patch vectors)
-- `MLArray.open(...)` mode/mmap combinations
-- operations:
-  - `read_full`
-  - `read_patch_random`
-  - `write_patch_random`
-- warm and cold cache runs
-Outputs are printed to console and written to:
-- `bench/results/bench_io_blosc2_layouts.csv`
-- `bench/results/bench_io_blosc2_layouts.json`
-### Example
-```bash
-python bench/bench_io_blosc2_layouts.py \
-  --tiers small medium \
-  --runs 3 \
-  --cache-mode both \
-  --nthreads 1
-```
-If you hit native segfaults in Blosc2 during long runs, isolate each measured run
-in a subprocess (slower, but robust):
-```bash
-python bench/bench_io_blosc2_layouts.py \
-  --tiers small medium \
-  --runs 3 \
-  --cache-mode both \
-  --nthreads 1 \
-  --isolate-runs
-```
-### Cold cache note (Linux)
-For cold-cache read measurements, the script drops Linux page cache **after the dataset has been created on disk and immediately before measured open/read runs**.
-This requires root:
-- run as root, or
-- run via `sudo`
-If cache dropping fails, those runs are recorded with error status in results.

mlarray 0.0.52__tar.gz → 0.0.53__tar.gz

mlarray 0.0.52tar.gz → 0.0.53tar.gz