mlarray 0.0.52__tar.gz → 0.0.53__tar.gz
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- {mlarray-0.0.52 → mlarray-0.0.53}/PKG-INFO +2 -10
- {mlarray-0.0.52 → mlarray-0.0.53}/README.md +1 -9
- mlarray-0.0.53/docs/cli.md +29 -0
- mlarray-0.0.53/mlarray/cli.py +60 -0
- {mlarray-0.0.52 → mlarray-0.0.53}/mlarray/mlarray.py +116 -49
- {mlarray-0.0.52 → mlarray-0.0.53}/mlarray.egg-info/PKG-INFO +2 -10
- {mlarray-0.0.52 → mlarray-0.0.53}/mlarray.egg-info/SOURCES.txt +0 -8
- {mlarray-0.0.52 → mlarray-0.0.53}/tests/test_optimization.py +0 -38
- mlarray-0.0.52/bench/.gitignore +0 -2
- mlarray-0.0.52/bench/README.md +0 -56
- mlarray-0.0.52/bench/bench_convert_nii_to_mla_random_read.py +0 -586
- mlarray-0.0.52/bench/bench_io_blosc2_layouts.py +0 -1178
- mlarray-0.0.52/bench/helper/print_mla_layouts.py +0 -85
- mlarray-0.0.52/docs/cli.md +0 -34
- mlarray-0.0.52/mlarray/blosc2_layout_strategies.py +0 -766
- mlarray-0.0.52/mlarray/cli.py +0 -133
- mlarray-0.0.52/tests/test_cli.py +0 -139
- {mlarray-0.0.52 → mlarray-0.0.53}/.github/workflows/workflow.yml +0 -0
- {mlarray-0.0.52 → mlarray-0.0.53}/.gitignore +0 -0
- {mlarray-0.0.52 → mlarray-0.0.53}/LICENSE +0 -0
- {mlarray-0.0.52 → mlarray-0.0.53}/MANIFEST.in +0 -0
- {mlarray-0.0.52 → mlarray-0.0.53}/assets/banner.png +0 -0
- {mlarray-0.0.52 → mlarray-0.0.53}/assets/banner.png~ +0 -0
- {mlarray-0.0.52 → mlarray-0.0.53}/docs/api.md +0 -0
- {mlarray-0.0.52 → mlarray-0.0.53}/docs/index.md +0 -0
- {mlarray-0.0.52 → mlarray-0.0.53}/docs/optimization.md +0 -0
- {mlarray-0.0.52 → mlarray-0.0.53}/docs/schema.md +0 -0
- {mlarray-0.0.52 → mlarray-0.0.53}/docs/usage.md +0 -0
- {mlarray-0.0.52 → mlarray-0.0.53}/docs/why.md +0 -0
- {mlarray-0.0.52 → mlarray-0.0.53}/examples/example_asarray.py +0 -0
- {mlarray-0.0.52 → mlarray-0.0.53}/examples/example_bboxes_only.py +0 -0
- {mlarray-0.0.52 → mlarray-0.0.53}/examples/example_channel.py +0 -0
- {mlarray-0.0.52 → mlarray-0.0.53}/examples/example_compress_decompress.py +0 -0
- {mlarray-0.0.52 → mlarray-0.0.53}/examples/example_compressed_vs_uncompressed.py +0 -0
- {mlarray-0.0.52 → mlarray-0.0.53}/examples/example_in_memory_constructors.py +0 -0
- {mlarray-0.0.52 → mlarray-0.0.53}/examples/example_metadata_only.py +0 -0
- {mlarray-0.0.52 → mlarray-0.0.53}/examples/example_non_spatial.py +0 -0
- {mlarray-0.0.52 → mlarray-0.0.53}/examples/example_open.py +0 -0
- {mlarray-0.0.52 → mlarray-0.0.53}/examples/example_save_load.py +0 -0
- {mlarray-0.0.52 → mlarray-0.0.53}/mkdocs.yml +0 -0
- {mlarray-0.0.52 → mlarray-0.0.53}/mlarray/__init__.py +0 -0
- {mlarray-0.0.52 → mlarray-0.0.53}/mlarray/meta.py +0 -0
- {mlarray-0.0.52 → mlarray-0.0.53}/mlarray/utils.py +0 -0
- {mlarray-0.0.52 → mlarray-0.0.53}/mlarray.egg-info/dependency_links.txt +0 -0
- {mlarray-0.0.52 → mlarray-0.0.53}/mlarray.egg-info/entry_points.txt +0 -0
- {mlarray-0.0.52 → mlarray-0.0.53}/mlarray.egg-info/requires.txt +0 -0
- {mlarray-0.0.52 → mlarray-0.0.53}/mlarray.egg-info/top_level.txt +0 -0
- {mlarray-0.0.52 → mlarray-0.0.53}/pyproject.toml +0 -0
- {mlarray-0.0.52 → mlarray-0.0.53}/setup.cfg +0 -0
- {mlarray-0.0.52 → mlarray-0.0.53}/tests/test_asarray.py +0 -0
- {mlarray-0.0.52 → mlarray-0.0.53}/tests/test_bboxes.py +0 -0
- {mlarray-0.0.52 → mlarray-0.0.53}/tests/test_compress_decompress.py +0 -0
- {mlarray-0.0.52 → mlarray-0.0.53}/tests/test_constructors.py +0 -0
- {mlarray-0.0.52 → mlarray-0.0.53}/tests/test_create.py +0 -0
- {mlarray-0.0.52 → mlarray-0.0.53}/tests/test_meta_safety.py +0 -0
- {mlarray-0.0.52 → mlarray-0.0.53}/tests/test_metadata.py +0 -0
- {mlarray-0.0.52 → mlarray-0.0.53}/tests/test_open.py +0 -0
- {mlarray-0.0.52 → mlarray-0.0.53}/tests/test_usage.py +0 -0
|
@@ -1,6 +1,6 @@
|
|
|
1
1
|
Metadata-Version: 2.4
|
|
2
2
|
Name: mlarray
|
|
3
|
-
Version: 0.0.
|
|
3
|
+
Version: 0.0.53
|
|
4
4
|
Summary: Array format specialized for Machine Learning with Blosc2 backend and standardized metadata.
|
|
5
5
|
Author-email: Karol Gotkowski <karol.gotkowski@dkfz.de>
|
|
6
6
|
License: MIT
|
|
@@ -236,18 +236,10 @@ mlarray_header sample.mla
|
|
|
236
236
|
|
|
237
237
|
### mlarray_convert
|
|
238
238
|
|
|
239
|
-
Convert
|
|
240
|
-
|
|
241
|
-
When converting from NIfTI/NRRD to MLArray, source metadata is copied into
|
|
242
|
-
`meta.source`.
|
|
243
|
-
|
|
244
|
-
When converting from MLArray to NIfTI/NRRD, only `meta.source` is copied into
|
|
245
|
-
the output header. Spatial metadata (`spacing`, `origin`, `direction`) is set
|
|
246
|
-
explicitly from `meta.spatial`.
|
|
239
|
+
Convert a NIfTI or NRRD file to MLArray and copy metadata.
|
|
247
240
|
|
|
248
241
|
```bash
|
|
249
242
|
mlarray_convert sample.nii.gz output.mla
|
|
250
|
-
mlarray_convert sample.mla output.nii.gz
|
|
251
243
|
```
|
|
252
244
|
|
|
253
245
|
## Contributing
|
|
@@ -202,18 +202,10 @@ mlarray_header sample.mla
|
|
|
202
202
|
|
|
203
203
|
### mlarray_convert
|
|
204
204
|
|
|
205
|
-
Convert
|
|
206
|
-
|
|
207
|
-
When converting from NIfTI/NRRD to MLArray, source metadata is copied into
|
|
208
|
-
`meta.source`.
|
|
209
|
-
|
|
210
|
-
When converting from MLArray to NIfTI/NRRD, only `meta.source` is copied into
|
|
211
|
-
the output header. Spatial metadata (`spacing`, `origin`, `direction`) is set
|
|
212
|
-
explicitly from `meta.spatial`.
|
|
205
|
+
Convert a NIfTI or NRRD file to MLArray and copy metadata.
|
|
213
206
|
|
|
214
207
|
```bash
|
|
215
208
|
mlarray_convert sample.nii.gz output.mla
|
|
216
|
-
mlarray_convert sample.mla output.nii.gz
|
|
217
209
|
```
|
|
218
210
|
|
|
219
211
|
## Contributing
|
|
@@ -0,0 +1,29 @@
|
|
|
1
|
+
# CLI
|
|
2
|
+
|
|
3
|
+
MLArray includes a small command-line interface for common tasks such as **inspecting file headers** and **converting existing image formats** into MLArray. This is especially useful when you want to quickly verify metadata, debug a dataset, or batch-convert files without writing Python code.
|
|
4
|
+
|
|
5
|
+
The CLI currently focuses on core workflows (header inspection and conversion). Support for converting a wider range of image formats will be added over time.
|
|
6
|
+
|
|
7
|
+
---
|
|
8
|
+
|
|
9
|
+
## `mlarray_header`
|
|
10
|
+
|
|
11
|
+
Print the metadata header from a `.mla` file.
|
|
12
|
+
|
|
13
|
+
This command is useful for quickly checking spatial metadata, stored schemas, and other file-level information without loading the full array into memory.
|
|
14
|
+
|
|
15
|
+
```bash
|
|
16
|
+
mlarray_header sample.mla
|
|
17
|
+
```
|
|
18
|
+
|
|
19
|
+
---
|
|
20
|
+
|
|
21
|
+
## `mlarray_convert`
|
|
22
|
+
|
|
23
|
+
Convert a NIfTI or NRRD file to MLArray and copy metadata.
|
|
24
|
+
|
|
25
|
+
This provides an easy way to bring existing medical imaging data into an MLArray-based workflow while preserving the original metadata for downstream analysis and visualization.
|
|
26
|
+
|
|
27
|
+
```bash
|
|
28
|
+
mlarray_convert sample.nii.gz output.mla
|
|
29
|
+
```
|
|
@@ -0,0 +1,60 @@
|
|
|
1
|
+
import argparse
|
|
2
|
+
import json
|
|
3
|
+
from typing import Union
|
|
4
|
+
from pathlib import Path
|
|
5
|
+
from mlarray import MLArray
|
|
6
|
+
from mlarray.meta import _meta_internal_write
|
|
7
|
+
|
|
8
|
+
try:
|
|
9
|
+
from medvol import MedVol
|
|
10
|
+
except ImportError:
|
|
11
|
+
MedVol = None
|
|
12
|
+
|
|
13
|
+
|
|
14
|
+
def print_header(filepath: Union[str, Path]) -> None:
|
|
15
|
+
"""Print the MLArray metadata header for a file.
|
|
16
|
+
|
|
17
|
+
Args:
|
|
18
|
+
filepath: Path to a ".mla" file.
|
|
19
|
+
"""
|
|
20
|
+
meta = MLArray(filepath).meta
|
|
21
|
+
if meta is None:
|
|
22
|
+
print("null")
|
|
23
|
+
return
|
|
24
|
+
print(json.dumps(meta.to_plain(include_none=True), indent=2, sort_keys=True))
|
|
25
|
+
|
|
26
|
+
|
|
27
|
+
def convert_to_mlarray(load_filepath: Union[str, Path], save_filepath: Union[str, Path]):
|
|
28
|
+
if MedVol is None:
|
|
29
|
+
raise RuntimeError("medvol is required for mlarray_convert; install with 'pip install mlarray[all]'.")
|
|
30
|
+
image_meta_format = None
|
|
31
|
+
if str(load_filepath).endswith(f".nii.gz") or str(load_filepath).endswith(f".nii"):
|
|
32
|
+
image_meta_format = "nifti"
|
|
33
|
+
elif str(load_filepath).endswith(f".nrrd"):
|
|
34
|
+
image_meta_format = "nrrd"
|
|
35
|
+
image_medvol = MedVol(load_filepath)
|
|
36
|
+
image_mlarray = MLArray(image_medvol.array, spacing=image_medvol.spacing, origin=image_medvol.origin, direction=image_medvol.direction, meta=image_medvol.header)
|
|
37
|
+
with _meta_internal_write():
|
|
38
|
+
image_mlarray.meta._image_meta_format = image_meta_format
|
|
39
|
+
image_mlarray.save(save_filepath)
|
|
40
|
+
|
|
41
|
+
|
|
42
|
+
def cli_print_header() -> None:
|
|
43
|
+
parser = argparse.ArgumentParser(
|
|
44
|
+
prog="mlarray_header",
|
|
45
|
+
description="Print the MLArray metadata header for a file.",
|
|
46
|
+
)
|
|
47
|
+
parser.add_argument("filepath", help="Path to a .mla file.")
|
|
48
|
+
args = parser.parse_args()
|
|
49
|
+
print_header(args.filepath)
|
|
50
|
+
|
|
51
|
+
|
|
52
|
+
def cli_convert_to_mlarray() -> None:
|
|
53
|
+
parser = argparse.ArgumentParser(
|
|
54
|
+
prog="mlarray_convert",
|
|
55
|
+
description="Convert a NiFTi or NRRD file to MLArray and copy all metadata.",
|
|
56
|
+
)
|
|
57
|
+
parser.add_argument("load_filepath", help="Path to the NiFTi (.nii.gz, .nii) or NRRD (.nrrd) file to load.")
|
|
58
|
+
parser.add_argument("save_filepath", help="Path to the MLArray (.mla) file to save.")
|
|
59
|
+
args = parser.parse_args()
|
|
60
|
+
convert_to_mlarray(args.load_filepath, args.save_filepath)
|
|
@@ -12,9 +12,6 @@ from mlarray.meta import (
|
|
|
12
12
|
_spatial_axis_mask,
|
|
13
13
|
_meta_internal_write,
|
|
14
14
|
)
|
|
15
|
-
from mlarray.blosc2_layout_strategies import (
|
|
16
|
-
comp_blosc2_params_spatial_only_magnitude,
|
|
17
|
-
)
|
|
18
15
|
from mlarray.utils import is_serializable
|
|
19
16
|
import pickle
|
|
20
17
|
import gzip
|
|
@@ -1327,8 +1324,8 @@ class MLArray:
|
|
|
1327
1324
|
@classmethod
|
|
1328
1325
|
def comp_blosc2_params(
|
|
1329
1326
|
cls,
|
|
1330
|
-
image_size: Tuple[int,
|
|
1331
|
-
patch_size: Tuple[int,
|
|
1327
|
+
image_size: Union[Tuple[int, int], Tuple[int, int, int], Tuple[int, int, int, int]],
|
|
1328
|
+
patch_size: Union[Tuple[int, int], Tuple[int, int, int]],
|
|
1332
1329
|
spatial_axis_mask: Optional[list[bool]] = None,
|
|
1333
1330
|
bytes_per_pixel: int = 4, # 4 byte are float32
|
|
1334
1331
|
l1_cache_size_per_core_in_bytes: int = 32768, # 1 Kibibyte (KiB) = 2^10 Byte; 32 KiB = 32768 Byte
|
|
@@ -1336,35 +1333,31 @@ class MLArray:
|
|
|
1336
1333
|
safety_factor: float = 0.8 # we dont will the caches to the brim. 0.8 means we target 80% of the caches
|
|
1337
1334
|
):
|
|
1338
1335
|
"""
|
|
1339
|
-
|
|
1340
|
-
|
|
1341
|
-
|
|
1342
|
-
|
|
1343
|
-
|
|
1344
|
-
|
|
1345
|
-
|
|
1346
|
-
|
|
1347
|
-
|
|
1348
|
-
|
|
1349
|
-
|
|
1350
|
-
|
|
1351
|
-
|
|
1352
|
-
|
|
1353
|
-
|
|
1354
|
-
|
|
1355
|
-
5. Enforce structural constraints that keep the layout regular:
|
|
1356
|
-
non-clipped spatial axes stay even, and non-clipped chunk axes
|
|
1357
|
-
remain multiples of their corresponding block axes.
|
|
1358
|
-
|
|
1359
|
-
This strategy supports arbitrary numbers of spatial and non-spatial axes
|
|
1360
|
-
as long as the patch size dimensionality matches the number of spatial axes.
|
|
1336
|
+
Computes a recommended block and chunk size for saving arrays with Blosc v2.
|
|
1337
|
+
|
|
1338
|
+
Blosc2 NDIM documentation:
|
|
1339
|
+
"Having a second partition allows for greater flexibility in fitting different partitions to different CPU cache levels.
|
|
1340
|
+
Typically, the first partition (also known as chunks) should be sized to fit within the L3 cache,
|
|
1341
|
+
while the second partition (also known as blocks) should be sized to fit within the L2 or L1 caches,
|
|
1342
|
+
depending on whether the priority is compression ratio or speed."
|
|
1343
|
+
(Source: https://www.blosc.org/posts/blosc2-ndim-intro/)
|
|
1344
|
+
|
|
1345
|
+
Our approach is not fully optimized for this yet.
|
|
1346
|
+
Currently, we aim to fit the uncompressed block within the L1 cache, accepting that it might occasionally spill over into L2, which we consider acceptable.
|
|
1347
|
+
|
|
1348
|
+
Note: This configuration is specifically optimized for nnU-Net data loading, where each read operation is performed by a single core, so multi-threading is not an option.
|
|
1349
|
+
|
|
1350
|
+
The default cache values are based on an older Intel 4110 CPU with 32KB L1, 128KB L2, and 1408KB L3 cache per core.
|
|
1351
|
+
We haven't further optimized for modern CPUs with larger caches, as our data must still be compatible with the older systems.
|
|
1361
1352
|
|
|
1362
1353
|
Args:
|
|
1363
|
-
image_size (Tuple[int,
|
|
1364
|
-
|
|
1365
|
-
|
|
1366
|
-
|
|
1367
|
-
|
|
1354
|
+
image_size (Union[Tuple[int, int], Tuple[int, int, int], Tuple[int, int, int, int]]):
|
|
1355
|
+
Image shape. Use a 2D, 3D, or 4D size; 2D/3D inputs are
|
|
1356
|
+
internally expanded to 4D (with non-spatial axes first).
|
|
1357
|
+
patch_size (Union[Tuple[int, int], Tuple[int, int, int]]): Patch
|
|
1358
|
+
size for spatial dimensions. Use a 2-tuple (x, y) or 3-tuple
|
|
1359
|
+
(x, y, z).
|
|
1360
|
+
spatial_axis_mask (Optional[list[bool]]): Mask indicating for every axis whether it is spatial or not.
|
|
1368
1361
|
bytes_per_pixel (int): Number of bytes per element. Defaults to 4
|
|
1369
1362
|
for float32.
|
|
1370
1363
|
l1_cache_size_per_core_in_bytes (int): L1 cache per core in bytes.
|
|
@@ -1374,15 +1367,93 @@ class MLArray:
|
|
|
1374
1367
|
Returns:
|
|
1375
1368
|
Tuple[List[int], List[int]]: Recommended chunk size and block size.
|
|
1376
1369
|
"""
|
|
1377
|
-
|
|
1378
|
-
|
|
1379
|
-
|
|
1380
|
-
|
|
1381
|
-
|
|
1382
|
-
|
|
1383
|
-
|
|
1384
|
-
|
|
1385
|
-
|
|
1370
|
+
def _move_index_list(a, src, dst):
|
|
1371
|
+
a = list(a)
|
|
1372
|
+
x = a.pop(src)
|
|
1373
|
+
a.insert(dst, x)
|
|
1374
|
+
return a
|
|
1375
|
+
|
|
1376
|
+
num_squeezes = 0
|
|
1377
|
+
if len(image_size) == 2:
|
|
1378
|
+
image_size = (1, 1, *image_size)
|
|
1379
|
+
num_squeezes = 2
|
|
1380
|
+
elif len(image_size) == 3:
|
|
1381
|
+
image_size = (1, *image_size)
|
|
1382
|
+
num_squeezes = 1
|
|
1383
|
+
|
|
1384
|
+
non_spatial_axis = None
|
|
1385
|
+
if spatial_axis_mask is not None:
|
|
1386
|
+
non_spatial_axis_mask = [not b for b in spatial_axis_mask]
|
|
1387
|
+
if sum(non_spatial_axis_mask) > 1:
|
|
1388
|
+
raise RuntimeError("Automatic blosc2 optimization currently only supports one non-spatial axis. Please set chunk and block size manually.")
|
|
1389
|
+
non_spatial_axis = next((i for i, v in enumerate(non_spatial_axis_mask) if v), None)
|
|
1390
|
+
if non_spatial_axis is not None:
|
|
1391
|
+
image_size = _move_index_list(image_size, non_spatial_axis+num_squeezes, 0)
|
|
1392
|
+
|
|
1393
|
+
if len(image_size) != 4:
|
|
1394
|
+
raise RuntimeError("Image size must be 4D.")
|
|
1395
|
+
|
|
1396
|
+
if not (len(patch_size) == 2 or len(patch_size) == 3):
|
|
1397
|
+
raise RuntimeError("Patch size must be 2D or 3D.")
|
|
1398
|
+
|
|
1399
|
+
non_spatial_size = image_size[0]
|
|
1400
|
+
if len(patch_size) == 2:
|
|
1401
|
+
patch_size = [1, *patch_size]
|
|
1402
|
+
patch_size = np.array(patch_size)
|
|
1403
|
+
block_size = np.array((non_spatial_size, *[2 ** (max(0, math.ceil(math.log2(i)))) for i in patch_size]))
|
|
1404
|
+
|
|
1405
|
+
# shrink the block size until it fits in L1
|
|
1406
|
+
estimated_nbytes_block = np.prod(block_size) * bytes_per_pixel
|
|
1407
|
+
while estimated_nbytes_block > (l1_cache_size_per_core_in_bytes * safety_factor):
|
|
1408
|
+
# pick largest deviation from patch_size that is not 1
|
|
1409
|
+
axis_order = np.argsort(block_size[1:] / patch_size)[::-1]
|
|
1410
|
+
idx = 0
|
|
1411
|
+
picked_axis = axis_order[idx]
|
|
1412
|
+
while block_size[picked_axis + 1] == 1 or block_size[picked_axis + 1] == 1:
|
|
1413
|
+
idx += 1
|
|
1414
|
+
picked_axis = axis_order[idx]
|
|
1415
|
+
# now reduce that axis to the next lowest power of 2
|
|
1416
|
+
block_size[picked_axis + 1] = 2 ** (max(0, math.floor(math.log2(block_size[picked_axis + 1] - 1))))
|
|
1417
|
+
block_size[picked_axis + 1] = min(block_size[picked_axis + 1], image_size[picked_axis + 1])
|
|
1418
|
+
estimated_nbytes_block = np.prod(block_size) * bytes_per_pixel
|
|
1419
|
+
|
|
1420
|
+
block_size = np.array([min(i, j) for i, j in zip(image_size, block_size)])
|
|
1421
|
+
|
|
1422
|
+
# note: there is no use extending the chunk size to 3d when we have a 2d patch size! This would unnecessarily
|
|
1423
|
+
# load data into L3
|
|
1424
|
+
# now tile the blocks into chunks until we hit image_size or the l3 cache per core limit
|
|
1425
|
+
chunk_size = deepcopy(block_size)
|
|
1426
|
+
estimated_nbytes_chunk = np.prod(chunk_size) * bytes_per_pixel
|
|
1427
|
+
while estimated_nbytes_chunk < (l3_cache_size_per_core_in_bytes * safety_factor):
|
|
1428
|
+
if patch_size[0] == 1 and all([i == j for i, j in zip(chunk_size[2:], image_size[2:])]):
|
|
1429
|
+
break
|
|
1430
|
+
if all([i == j for i, j in zip(chunk_size, image_size)]):
|
|
1431
|
+
break
|
|
1432
|
+
# find axis that deviates from block_size the most
|
|
1433
|
+
axis_order = np.argsort(chunk_size[1:] / block_size[1:])
|
|
1434
|
+
idx = 0
|
|
1435
|
+
picked_axis = axis_order[idx]
|
|
1436
|
+
while chunk_size[picked_axis + 1] == image_size[picked_axis + 1] or patch_size[picked_axis] == 1:
|
|
1437
|
+
idx += 1
|
|
1438
|
+
picked_axis = axis_order[idx]
|
|
1439
|
+
chunk_size[picked_axis + 1] += block_size[picked_axis + 1]
|
|
1440
|
+
chunk_size[picked_axis + 1] = min(chunk_size[picked_axis + 1], image_size[picked_axis + 1])
|
|
1441
|
+
estimated_nbytes_chunk = np.prod(chunk_size) * bytes_per_pixel
|
|
1442
|
+
if np.mean([i / j for i, j in zip(chunk_size[1:], patch_size)]) > 1.5:
|
|
1443
|
+
# chunk size should not exceed patch size * 1.5 on average
|
|
1444
|
+
chunk_size[picked_axis + 1] -= block_size[picked_axis + 1]
|
|
1445
|
+
break
|
|
1446
|
+
# better safe than sorry
|
|
1447
|
+
chunk_size = [min(i, j) for i, j in zip(image_size, chunk_size)]
|
|
1448
|
+
|
|
1449
|
+
if non_spatial_axis is not None:
|
|
1450
|
+
block_size = _move_index_list(block_size, 0, non_spatial_axis+num_squeezes)
|
|
1451
|
+
chunk_size = _move_index_list(chunk_size, 0, non_spatial_axis+num_squeezes)
|
|
1452
|
+
|
|
1453
|
+
block_size = block_size[num_squeezes:]
|
|
1454
|
+
chunk_size = chunk_size[num_squeezes:]
|
|
1455
|
+
|
|
1456
|
+
return [int(value) for value in chunk_size], [int(value) for value in block_size]
|
|
1386
1457
|
|
|
1387
1458
|
def _open(
|
|
1388
1459
|
self,
|
|
@@ -1814,6 +1885,9 @@ class MLArray:
|
|
|
1814
1885
|
MetaBlosc2: Validated Blosc2 metadata instance.
|
|
1815
1886
|
"""
|
|
1816
1887
|
num_spatial_axes = sum(spatial_axis_mask)
|
|
1888
|
+
num_non_spatial_axes = sum([not b for b in spatial_axis_mask])
|
|
1889
|
+
if patch_size is not None and patch_size != "default" and (num_spatial_axes == 1 or num_spatial_axes > 3 or num_non_spatial_axes > 1):
|
|
1890
|
+
raise NotImplementedError("Chunk and block size optimization based on patch size is only implemented for 2D and 3D spatial images with at most one further non-spatial axis. Please set the chunk and block size manually or set to None for blosc2 to determine a chunk and block size.")
|
|
1817
1891
|
if patch_size is not None and patch_size != "default" and (chunk_size is not None or block_size is not None):
|
|
1818
1892
|
raise RuntimeError("patch_size and chunk_size / block_size cannot both be explicitly set.")
|
|
1819
1893
|
if (chunk_size is not None and block_size is None) or (chunk_size is None and block_size is not None):
|
|
@@ -1830,14 +1904,7 @@ class MLArray:
|
|
|
1830
1904
|
if chunk_size is not None or block_size is not None:
|
|
1831
1905
|
patch_size = None
|
|
1832
1906
|
|
|
1833
|
-
patch_size = [patch_size] *
|
|
1834
|
-
|
|
1835
|
-
if patch_size is not None and num_spatial_axes == 0:
|
|
1836
|
-
raise RuntimeError(
|
|
1837
|
-
"Automatic patch-size optimization requires at least one spatial axis. "
|
|
1838
|
-
"Set patch_size=None and provide chunk_size/block_size manually, "
|
|
1839
|
-
"or let Blosc2 determine the layout."
|
|
1840
|
-
)
|
|
1907
|
+
patch_size = [patch_size] * len(shape) if isinstance(patch_size, int) else patch_size
|
|
1841
1908
|
|
|
1842
1909
|
if patch_size is not None:
|
|
1843
1910
|
chunk_size, block_size = MLArray.comp_blosc2_params(shape, patch_size, spatial_axis_mask, bytes_per_pixel=dtype_itemsize)
|
|
@@ -1,6 +1,6 @@
|
|
|
1
1
|
Metadata-Version: 2.4
|
|
2
2
|
Name: mlarray
|
|
3
|
-
Version: 0.0.
|
|
3
|
+
Version: 0.0.53
|
|
4
4
|
Summary: Array format specialized for Machine Learning with Blosc2 backend and standardized metadata.
|
|
5
5
|
Author-email: Karol Gotkowski <karol.gotkowski@dkfz.de>
|
|
6
6
|
License: MIT
|
|
@@ -236,18 +236,10 @@ mlarray_header sample.mla
|
|
|
236
236
|
|
|
237
237
|
### mlarray_convert
|
|
238
238
|
|
|
239
|
-
Convert
|
|
240
|
-
|
|
241
|
-
When converting from NIfTI/NRRD to MLArray, source metadata is copied into
|
|
242
|
-
`meta.source`.
|
|
243
|
-
|
|
244
|
-
When converting from MLArray to NIfTI/NRRD, only `meta.source` is copied into
|
|
245
|
-
the output header. Spatial metadata (`spacing`, `origin`, `direction`) is set
|
|
246
|
-
explicitly from `meta.spatial`.
|
|
239
|
+
Convert a NIfTI or NRRD file to MLArray and copy metadata.
|
|
247
240
|
|
|
248
241
|
```bash
|
|
249
242
|
mlarray_convert sample.nii.gz output.mla
|
|
250
|
-
mlarray_convert sample.mla output.nii.gz
|
|
251
243
|
```
|
|
252
244
|
|
|
253
245
|
## Contributing
|
|
@@ -5,7 +5,6 @@ README.md
|
|
|
5
5
|
mkdocs.yml
|
|
6
6
|
pyproject.toml
|
|
7
7
|
./mlarray/__init__.py
|
|
8
|
-
./mlarray/blosc2_layout_strategies.py
|
|
9
8
|
./mlarray/cli.py
|
|
10
9
|
./mlarray/meta.py
|
|
11
10
|
./mlarray/mlarray.py
|
|
@@ -13,11 +12,6 @@ pyproject.toml
|
|
|
13
12
|
.github/workflows/workflow.yml
|
|
14
13
|
assets/banner.png
|
|
15
14
|
assets/banner.png~
|
|
16
|
-
bench/.gitignore
|
|
17
|
-
bench/README.md
|
|
18
|
-
bench/bench_convert_nii_to_mla_random_read.py
|
|
19
|
-
bench/bench_io_blosc2_layouts.py
|
|
20
|
-
bench/helper/print_mla_layouts.py
|
|
21
15
|
docs/api.md
|
|
22
16
|
docs/cli.md
|
|
23
17
|
docs/index.md
|
|
@@ -36,7 +30,6 @@ examples/example_non_spatial.py
|
|
|
36
30
|
examples/example_open.py
|
|
37
31
|
examples/example_save_load.py
|
|
38
32
|
mlarray/__init__.py
|
|
39
|
-
mlarray/blosc2_layout_strategies.py
|
|
40
33
|
mlarray/cli.py
|
|
41
34
|
mlarray/meta.py
|
|
42
35
|
mlarray/mlarray.py
|
|
@@ -49,7 +42,6 @@ mlarray.egg-info/requires.txt
|
|
|
49
42
|
mlarray.egg-info/top_level.txt
|
|
50
43
|
tests/test_asarray.py
|
|
51
44
|
tests/test_bboxes.py
|
|
52
|
-
tests/test_cli.py
|
|
53
45
|
tests/test_compress_decompress.py
|
|
54
46
|
tests/test_constructors.py
|
|
55
47
|
tests/test_create.py
|
|
@@ -5,7 +5,6 @@ from pathlib import Path
|
|
|
5
5
|
import numpy as np
|
|
6
6
|
|
|
7
7
|
from mlarray import MLArray, MLARRAY_DEFAULT_PATCH_SIZE
|
|
8
|
-
from mlarray.meta import MetaSpatial
|
|
9
8
|
|
|
10
9
|
|
|
11
10
|
def _make_array(shape=(16, 32, 32), seed=0, dtype=np.float32):
|
|
@@ -112,43 +111,6 @@ class TestOptimizationExamples(unittest.TestCase):
|
|
|
112
111
|
self.assertIsNotNone(loaded.meta.blosc2.chunk_size)
|
|
113
112
|
self.assertIsNotNone(loaded.meta.blosc2.block_size)
|
|
114
113
|
|
|
115
|
-
def test_patch_optimization_supports_multiple_non_spatial_axes(self):
|
|
116
|
-
with tempfile.TemporaryDirectory() as tmpdir:
|
|
117
|
-
array = _make_array(shape=(2, 3, 16, 32, 32))
|
|
118
|
-
path = Path(tmpdir) / "multi-non-spatial.mla"
|
|
119
|
-
axis_labels = [
|
|
120
|
-
MetaSpatial.AxisLabel.channel,
|
|
121
|
-
MetaSpatial.AxisLabel.temporal,
|
|
122
|
-
MetaSpatial.AxisLabel.spatial_z,
|
|
123
|
-
MetaSpatial.AxisLabel.spatial_y,
|
|
124
|
-
MetaSpatial.AxisLabel.spatial_x,
|
|
125
|
-
]
|
|
126
|
-
|
|
127
|
-
MLArray(array, axis_labels=axis_labels, patch_size=8).save(path)
|
|
128
|
-
loaded = MLArray(path)
|
|
129
|
-
|
|
130
|
-
self.assertEqual(loaded.meta.blosc2.patch_size, [8, 8, 8])
|
|
131
|
-
self.assertEqual(len(loaded.meta.blosc2.chunk_size), 5)
|
|
132
|
-
self.assertEqual(len(loaded.meta.blosc2.block_size), 5)
|
|
133
|
-
self.assertEqual(loaded.meta.blosc2.chunk_size[:2], [1, 1])
|
|
134
|
-
self.assertEqual(loaded.meta.blosc2.block_size[:2], [1, 1])
|
|
135
|
-
|
|
136
|
-
def test_patch_optimization_supports_more_than_three_spatial_axes(self):
|
|
137
|
-
array = _make_array(shape=(2, 6, 8, 10, 12))
|
|
138
|
-
axis_labels = [
|
|
139
|
-
MetaSpatial.AxisLabel.channel,
|
|
140
|
-
MetaSpatial.AxisLabel.spatial,
|
|
141
|
-
MetaSpatial.AxisLabel.spatial,
|
|
142
|
-
MetaSpatial.AxisLabel.spatial,
|
|
143
|
-
MetaSpatial.AxisLabel.spatial,
|
|
144
|
-
]
|
|
145
|
-
|
|
146
|
-
image = MLArray(array, axis_labels=axis_labels, patch_size=(2, 4, 4, 6))
|
|
147
|
-
|
|
148
|
-
self.assertEqual(image.meta.blosc2.patch_size, [2, 4, 4, 6])
|
|
149
|
-
self.assertEqual(len(image.meta.blosc2.chunk_size), 5)
|
|
150
|
-
self.assertEqual(len(image.meta.blosc2.block_size), 5)
|
|
151
|
-
|
|
152
114
|
|
|
153
115
|
if __name__ == "__main__":
|
|
154
116
|
unittest.main()
|
mlarray-0.0.52/bench/.gitignore
DELETED
mlarray-0.0.52/bench/README.md
DELETED
|
@@ -1,56 +0,0 @@
|
|
|
1
|
-
# Benchmark Scripts
|
|
2
|
-
|
|
3
|
-
This folder contains benchmarking scripts for MLArray IO/layout experiments.
|
|
4
|
-
|
|
5
|
-
## `bench_io_blosc2_layouts.py`
|
|
6
|
-
|
|
7
|
-
Benchmarks IO throughput across:
|
|
8
|
-
|
|
9
|
-
- layout method(s) based on `comp_blosc2_params` (currently baseline copy only)
|
|
10
|
-
- image size tiers (`small`, `medium`, `large`, `very_large`)
|
|
11
|
-
- 2D / 3D / 4D-total array cases with spatial and optional non-spatial axis
|
|
12
|
-
- multiple patch sizes (2D and 3D patch vectors)
|
|
13
|
-
- `MLArray.open(...)` mode/mmap combinations
|
|
14
|
-
- operations:
|
|
15
|
-
- `read_full`
|
|
16
|
-
- `read_patch_random`
|
|
17
|
-
- `write_patch_random`
|
|
18
|
-
- warm and cold cache runs
|
|
19
|
-
|
|
20
|
-
Outputs are printed to console and written to:
|
|
21
|
-
|
|
22
|
-
- `bench/results/bench_io_blosc2_layouts.csv`
|
|
23
|
-
- `bench/results/bench_io_blosc2_layouts.json`
|
|
24
|
-
|
|
25
|
-
### Example
|
|
26
|
-
|
|
27
|
-
```bash
|
|
28
|
-
python bench/bench_io_blosc2_layouts.py \
|
|
29
|
-
--tiers small medium \
|
|
30
|
-
--runs 3 \
|
|
31
|
-
--cache-mode both \
|
|
32
|
-
--nthreads 1
|
|
33
|
-
```
|
|
34
|
-
|
|
35
|
-
If you hit native segfaults in Blosc2 during long runs, isolate each measured run
|
|
36
|
-
in a subprocess (slower, but robust):
|
|
37
|
-
|
|
38
|
-
```bash
|
|
39
|
-
python bench/bench_io_blosc2_layouts.py \
|
|
40
|
-
--tiers small medium \
|
|
41
|
-
--runs 3 \
|
|
42
|
-
--cache-mode both \
|
|
43
|
-
--nthreads 1 \
|
|
44
|
-
--isolate-runs
|
|
45
|
-
```
|
|
46
|
-
|
|
47
|
-
### Cold cache note (Linux)
|
|
48
|
-
|
|
49
|
-
For cold-cache read measurements, the script drops Linux page cache **after the dataset has been created on disk and immediately before measured open/read runs**.
|
|
50
|
-
|
|
51
|
-
This requires root:
|
|
52
|
-
|
|
53
|
-
- run as root, or
|
|
54
|
-
- run via `sudo`
|
|
55
|
-
|
|
56
|
-
If cache dropping fails, those runs are recorded with error status in results.
|