PyPI - patchworks - Versions diffs - 0.5.0__tar.gz → 0.6.0__tar.gz - Mend

patchworks 0.5.0tar.gz → 0.6.0tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (50) hide show

{patchworks-0.5.0 → patchworks-0.6.0}/.github/workflows/release.yml RENAMED Viewed

@@ -48,3 +48,17 @@ jobs:
       - name: Publish to PyPI
         uses: pypa/gh-action-pypi-publish@release/v1
+  # Rebuild the org-wide pdoc apidocs site so it picks up the new version.
+  apidocs:
+    needs: release
+    runs-on: ubuntu-latest
+    steps:
+      - name: Trigger imcf.github.io apidocs rebuild
+        uses: peter-evans/repository-dispatch@v3
+        with:
+          # Fine-grained PAT with "Contents: write" on imcf/imcf.github.io,
+          # stored as the APIDOCS_DISPATCH_TOKEN secret in this repo.
+          token: ${{ secrets.APIDOCS_DISPATCH_TOKEN }}
+          repository: imcf/imcf.github.io
+          event-type: dispatch-event

{patchworks-0.5.0 → patchworks-0.6.0}/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.4
 Name: patchworks
-Version: 0.5.0
+Version: 0.6.0
 Summary: Tiled processing of arbitrarily large images with globally consistent labels
 Project-URL: Homepage, https://github.com/imcf/patchworks
 Project-URL: Issues, https://github.com/imcf/patchworks/issues
@@ -127,11 +127,15 @@ def my_fn(tile):
     return label(tile > threshold_otsu(tile)).astype("int32")
-result = tile_process("image.zarr", my_fn, compute=True)
+result = tile_process("image.zarr", my_fn)
 ```
-Done. `result` is a NumPy array of integer labels, same spatial shape as the
-input, with globally unique IDs across all tiles.
+Done. `result` is a **lazy dask array** of integer labels (call `.compute()`
+for a NumPy array), same spatial shape as the input, with globally unique IDs
+across all tiles. By default the labels are also written **into the input
+store** at `image.zarr/labels/labels/` as a multi-scale pyramid, so the image
+and its segmentation live in one OME-ZARR. Pass `write_to="labels.zarr"` to
+write a separate store instead.
 ---
@@ -203,6 +207,26 @@ tile_process("image.zarr", my_custom_fn, tile_shape=(1, 512, 512))
 ---
+## Convert to OME-ZARR & view in napari
+Optional plugins close the loop: convert any image (Imaris `.ims`, CZI, LIF,
+ND2, OME-TIFF, … via bioio) to a pyramidal, **calibrated** OME-ZARR, then view
+the image and its labels in napari.
+```python
+from patchworks.plugins.ome_zarr import to_ome_zarr
+from patchworks.plugins.napari import view_in_napari
+to_ome_zarr("scan.ims", "scan.zarr")          # lazy, OOM-safe, keeps µm calibration
+view_in_napari("scan.zarr", labels="scan.zarr/labels/labels")
+```
+Pyramids downsample **X/Y only** (Z kept full-res) and are built level-by-level
+from disk, so terabyte volumes convert in bounded RAM. See the
+[OME-ZARR & napari guide](https://imcf.one/patchworks/guide/ome_zarr_napari/).
+---
 ## Common patterns
 ### Auto-size tiles from available memory
@@ -288,8 +312,8 @@ merged = merge_tile_labels(
 ## How tiling and merging work
-See [docs/how-it-works.md](docs/how-it-works.md) for a full explanation.
-Short version:
+See the [Merging labels guide](https://imcf.one/patchworks/guide/merging/) for
+a full explanation. Short version:
 1. Image is split into tiles (with optional overlap for boundary context).
 2. Your function is called independently on each tile. Dask handles parallelism
@@ -319,10 +343,15 @@ tiles where the dask-image approach stalls.
 ## Documentation
-- [Quick Start](docs/quickstart.md)
-- [API Reference](docs/api-reference.md)
-- [How It Works](docs/how-it-works.md)
-- [Examples](docs/examples/)
+Full docs, guides and tutorials: **<https://imcf.one/patchworks/>**
+- [Getting Started](https://imcf.one/patchworks/getting_started/)
+- [User Guide](https://imcf.one/patchworks/guide/tiling/) — tiling, merging,
+  empty-tile skipping, GPU/distributed, OME-ZARR & napari, pitfalls
+- [Examples](https://imcf.one/patchworks/examples/cellpose_2d/) — Cellpose,
+  StarDist, custom functions, standalone merge
+- [API Reference](https://imcf.one/patchworks/api/tile_process/) ·
+  [pdoc API](https://imcf.one/apidocs/patchworks/)
 ---
@@ -335,7 +364,11 @@ Optional:
 - `psutil` — accurate RAM sizing for `tile_shape="auto"`
 - `nvidia-ml-py` — accurate GPU VRAM sizing
 - `tqdm` — progress bars
-- `cellpose` — Cellpose plugin
+- `cellpose` — Cellpose plugin (`patchworks[cellpose]`)
+- `bioio` + readers — convert CZI/LIF/ND2/OME-TIFF/… to OME-ZARR
+  (`patchworks[bioio]`)
+- `imaris-ims-file-reader` — convert Imaris `.ims` (`patchworks[imaris]`)
+- `napari` — interactive viewer plugin (`patchworks[napari]`)
 ---

{patchworks-0.5.0 → patchworks-0.6.0}/README.md RENAMED Viewed

@@ -61,11 +61,15 @@ def my_fn(tile):
     return label(tile > threshold_otsu(tile)).astype("int32")
-result = tile_process("image.zarr", my_fn, compute=True)
+result = tile_process("image.zarr", my_fn)
 ```
-Done. `result` is a NumPy array of integer labels, same spatial shape as the
-input, with globally unique IDs across all tiles.
+Done. `result` is a **lazy dask array** of integer labels (call `.compute()`
+for a NumPy array), same spatial shape as the input, with globally unique IDs
+across all tiles. By default the labels are also written **into the input
+store** at `image.zarr/labels/labels/` as a multi-scale pyramid, so the image
+and its segmentation live in one OME-ZARR. Pass `write_to="labels.zarr"` to
+write a separate store instead.
 ---
@@ -137,6 +141,26 @@ tile_process("image.zarr", my_custom_fn, tile_shape=(1, 512, 512))
 ---
+## Convert to OME-ZARR & view in napari
+Optional plugins close the loop: convert any image (Imaris `.ims`, CZI, LIF,
+ND2, OME-TIFF, … via bioio) to a pyramidal, **calibrated** OME-ZARR, then view
+the image and its labels in napari.
+```python
+from patchworks.plugins.ome_zarr import to_ome_zarr
+from patchworks.plugins.napari import view_in_napari
+to_ome_zarr("scan.ims", "scan.zarr")          # lazy, OOM-safe, keeps µm calibration
+view_in_napari("scan.zarr", labels="scan.zarr/labels/labels")
+```
+Pyramids downsample **X/Y only** (Z kept full-res) and are built level-by-level
+from disk, so terabyte volumes convert in bounded RAM. See the
+[OME-ZARR & napari guide](https://imcf.one/patchworks/guide/ome_zarr_napari/).
+---
 ## Common patterns
 ### Auto-size tiles from available memory
@@ -222,8 +246,8 @@ merged = merge_tile_labels(
 ## How tiling and merging work
-See [docs/how-it-works.md](docs/how-it-works.md) for a full explanation.
-Short version:
+See the [Merging labels guide](https://imcf.one/patchworks/guide/merging/) for
+a full explanation. Short version:
 1. Image is split into tiles (with optional overlap for boundary context).
 2. Your function is called independently on each tile. Dask handles parallelism
@@ -253,10 +277,15 @@ tiles where the dask-image approach stalls.
 ## Documentation
-- [Quick Start](docs/quickstart.md)
-- [API Reference](docs/api-reference.md)
-- [How It Works](docs/how-it-works.md)
-- [Examples](docs/examples/)
+Full docs, guides and tutorials: **<https://imcf.one/patchworks/>**
+- [Getting Started](https://imcf.one/patchworks/getting_started/)
+- [User Guide](https://imcf.one/patchworks/guide/tiling/) — tiling, merging,
+  empty-tile skipping, GPU/distributed, OME-ZARR & napari, pitfalls
+- [Examples](https://imcf.one/patchworks/examples/cellpose_2d/) — Cellpose,
+  StarDist, custom functions, standalone merge
+- [API Reference](https://imcf.one/patchworks/api/tile_process/) ·
+  [pdoc API](https://imcf.one/apidocs/patchworks/)
 ---
@@ -269,7 +298,11 @@ Optional:
 - `psutil` — accurate RAM sizing for `tile_shape="auto"`
 - `nvidia-ml-py` — accurate GPU VRAM sizing
 - `tqdm` — progress bars
-- `cellpose` — Cellpose plugin
+- `cellpose` — Cellpose plugin (`patchworks[cellpose]`)
+- `bioio` + readers — convert CZI/LIF/ND2/OME-TIFF/… to OME-ZARR
+  (`patchworks[bioio]`)
+- `imaris-ims-file-reader` — convert Imaris `.ims` (`patchworks[imaris]`)
+- `napari` — interactive viewer plugin (`patchworks[napari]`)
 ---

{patchworks-0.5.0 → patchworks-0.6.0}/docs/examples/custom.md RENAMED Viewed

@@ -17,7 +17,7 @@ def threshold_fn(tile: np.ndarray) -> np.ndarray:
     return label(tile > thr).astype("int32")
-result = tile_process("image.zarr", threshold_fn, compute=True)
+result = tile_process("image.zarr", threshold_fn)
 ```
 ## Gaussian + morphological operations
@@ -86,12 +86,12 @@ from patchworks import tile_process
 # From any array-like source
 arr = da.from_array(my_numpy_array, chunks=(1, 1024, 1024))
-result = tile_process(arr, my_fn, compute=True)
+result = tile_process(arr, my_fn)
 # From tifffile
 import tifffile
 import dask.array as da
 arr = da.from_array(tifffile.imread("image.tif", aszarr=True))
-result = tile_process(arr, my_fn, compute=True)
+result = tile_process(arr, my_fn)
 ```

{patchworks-0.5.0 → patchworks-0.6.0}/docs/examples/custom_method.py RENAMED Viewed

@@ -41,8 +41,7 @@ result = tile_process(
     my_fn,
     tile_shape=(1, 512, 512),
     overlap=16,
-    compute=True,
     progress=True,
 )
-print(f"Found {result.max()} objects")
+print(f"Found {int(result.max().compute())} objects")

{patchworks-0.5.0 → patchworks-0.6.0}/docs/getting_started.md RENAMED Viewed

@@ -89,9 +89,11 @@ objects spanning tile boundaries are merged into a single label.
     ```python
     from patchworks import tile_process
-    result = tile_process("image.zarr", my_fn, compute=True)
+    # returns a lazy dask array; labels are also written into image.zarr by
+    # default (image.zarr/labels/labels/, as a pyramid)
+    result = tile_process("image.zarr", my_fn)
     print(result.shape)  # (z, y, x)
-    print(result.max())  # number of objects found
+    print(int(result.max().compute()))  # number of objects found
     ```
 === "From a dask array"
@@ -101,7 +103,7 @@ objects spanning tile boundaries are merged into a single label.
     from patchworks import tile_process
     arr = da.from_zarr("image.zarr")
-    result = tile_process(arr, my_fn, compute=True)
+    result = tile_process(arr, my_fn)
     ```
 === "Stream to zarr (recommended for large images)"

patchworks-0.6.0/docs/guide/performance.md ADDED Viewed

@@ -0,0 +1,60 @@
+# Performance & memory safety
+`tile_process` is built so a run **adapts to whatever machine it lands on** and
+can't run out of RAM/VRAM or freeze the box — without you tuning anything.
+## Automatic, machine-aware concurrency
+The staging step (running your `fn` once per tile to a temp store) and the
+merge step are sized to the host automatically:
+- **GPU** (`use_gpu=True`) → **one tile at a time**, so concurrent evaluations
+  can never exhaust VRAM.
+- **CPU** → as many tiles in flight as fit **80 % of available RAM** (estimated
+  from the tile size), and always **leaving one core free** so the machine
+  stays responsive — it never pins every core.
+The RAM figure is read live via `psutil`; without it, a conservative default is
+used instead of guessing high.
+## Overriding the worker count
+```python
+from patchworks import tile_process
+# let patchworks pick (recommended)
+tile_process("scan.zarr", fn)
+# or cap it yourself (staging threads + merge processes)
+tile_process("scan.zarr", fn, max_workers=8)
+```
+`max_workers` bounds both staging and merging. A running **distributed client**
+manages its own concurrency, so the override is skipped there — configure the
+cluster's memory limits instead.
+## Why it won't OOM or freeze
+| Resource | Guard |
+|----------|-------|
+| RAM | concurrent tiles × tile size × overhead ≤ 80 % of available RAM |
+| VRAM | GPU path runs one tile at a time |
+| CPU | always leaves at least one core free |
+| Disk I/O | each pyramid/stage level is streamed chunk-by-chunk; no whole volume in memory |
+The staging graph itself is kept small — a single fused `map_overlap`
+(halo → `fn` → trim) rather than three separate passes — and there is **no**
+extra read-back of the staged data.
+## Getting more speed
+- `tile_shape="auto"` sizes tiles to free RAM (or VRAM with `use_gpu=True`).
+- `skip_empty=True` with `estimate_empty_tiles()` skips background tiles.
+- A Dask **distributed** cluster (`make_local_cluster`) parallelises across
+  workers/GPUs; patchworks then defers concurrency to the cluster.
+!!! note "What doesn't help here"
+    The merge and relabel steps are already vectorised NumPy + SciPy (C-level)
+    with no per-voxel Python loop, and the pipeline is I/O-bound — so `numba`,
+    `cupy`, `arrow` and `xarray` bring essentially nothing. The real levers are
+    tile size, concurrency (above) and zarr chunking.

{patchworks-0.5.0 → patchworks-0.6.0}/docs/index.md RENAMED Viewed

@@ -53,7 +53,7 @@ def my_fn(tile):
     return label(tile > threshold_otsu(tile)).astype("int32")
-result = tile_process("image.zarr", my_fn, compute=True)
+result = tile_process("image.zarr", my_fn)
 ```
 Any function. Any image.

{patchworks-0.5.0 → patchworks-0.6.0}/mkdocs.yml RENAMED Viewed

@@ -38,6 +38,7 @@ nav:
   - Merging labels: guide/merging.md
   - Empty tile skipping: guide/skip_empty.md
   - GPU & distributed: guide/gpu_distributed.md
+  - Performance & memory: guide/performance.md
   - OME-ZARR & napari: guide/ome_zarr_napari.md
   - Pitfalls: guide/pitfalls.md
 - Examples:

{patchworks-0.5.0 → patchworks-0.6.0}/src/patchworks/_chunks.py RENAMED Viewed

@@ -57,6 +57,51 @@ def _get_available_memory() -> int:
         return 8 * 1024**3
+def safe_worker_count(
+    tile_nbytes: int,
+    *,
+    use_gpu: bool = False,
+    fn_overhead: int = 4,
+    ram_fraction: float = 0.8,
+) -> int:
+    """Concurrent tiles that fit the machine without OOM or a CPU freeze.
+    Bounds the threaded scheduler by two limits and takes the smaller:
+    * **CPU** — leaves at least one core free so the box stays responsive
+      (never pins every core).
+    * **RAM** — at most ``ram_fraction`` of available memory, assuming each
+      in-flight tile needs ``fn_overhead`` copies (halo + output + temporaries).
+    On GPU the answer is always 1: one evaluation at a time so concurrent
+    tiles can never exhaust VRAM. Without ``psutil`` it returns a conservative
+    default rather than guessing high.
+    Parameters
+    ----------
+    tile_nbytes : int
+        Size of one tile in bytes (``prod(tile_shape) * dtype.itemsize``).
+    use_gpu : bool, optional
+        Whether tiles are processed on the GPU.
+    fn_overhead : int, optional
+        Assumed peak number of tile-sized buffers alive per worker.
+    ram_fraction : float, optional
+        Fraction of available RAM the staging step may use.
+    Returns
+    -------
+    int
+        Worker-thread count (always >= 1).
+    """
+    cpu_cap = max(1, (os.cpu_count() or 1) - 1)
+    if use_gpu:
+        return 1
+    avail = _get_available_memory()
+    per_tile = max(1, int(tile_nbytes) * max(1, fn_overhead))
+    mem_cap = max(1, int(avail * ram_fraction) // per_tile)
+    return max(1, min(cpu_cap, mem_cap))
 def _get_gpu_memory() -> int:
     """Return free GPU VRAM in bytes. Falls back to 8 GiB default."""
     try:

{patchworks-0.5.0 → patchworks-0.6.0}/src/patchworks/_core.py RENAMED Viewed

@@ -11,7 +11,7 @@ from typing import Any, Callable, Union
 import dask.array as da
 import numpy as np
-from ._chunks import auto_tile_shape
+from ._chunks import auto_tile_shape, safe_worker_count
 from ._cluster import _client_is_in_process, _distributed_client
 from ._io import _auto_empty_threshold, load_ome_zarr
 from ._merge import zarr_native_merge
@@ -56,6 +56,7 @@ def tile_process(
     channel: int | None = 0,
     level: int = 0,
     use_gpu: bool = False,
+    max_workers: int | None = None,
     progress: bool = False,
     write_to: Union[str, Path, None] = None,
     output_component: str = "labels",
@@ -114,6 +115,13 @@ def tile_process(
         Pyramid level when *image* is a path (0 = full resolution).
     use_gpu:
         When ``tile_shape="auto"``, size tiles against GPU VRAM instead of RAM.
+        Also forces staging to one tile at a time (no VRAM contention).
+    max_workers:
+        Cap the worker threads/processes used for staging and merging. ``None``
+        (default) auto-sizes to the machine: bounded by available RAM (tile
+        size) and CPU (leaves one core free) so a run can neither OOM nor pin
+        every core. Ignored when a distributed client is active (it manages its
+        own concurrency).
     progress:
         Show a progress bar during the tile-writing and relabel steps.
     write_to:
@@ -283,11 +291,6 @@ def tile_process(
         for ax, c in enumerate(image.chunks)
     }
-    if overlap > 0:
-        # boundary="none" is required: only this boundary mode composes with
-        # trim_overlap to recover the original shape. "reflect" keeps the halo.
-        image = da.overlap.overlap(image, depth=_depth, boundary="none")
     # Wrap fn with optional empty-tile skipping
     _skip_thr = empty_threshold
     if skip_empty and _skip_thr is None:
@@ -303,28 +306,43 @@ def tile_process(
             logger.debug("process tile %s shape=%s", loc, block.shape)
         return fn(block)
-    labeled = image.map_blocks(
-        active_fn,
-        dtype=np.int32,
-        meta=np.empty((0,) * image.ndim, dtype=np.int32),
-    )
-    # Trim the overlap halo so staged tiles have clean boundaries for the
-    # boundary-slab scan. Without this the scan reads halo-expanded chunks and
-    # the merged output is larger than the input.
+    _meta = np.empty((0,) * image.ndim, dtype=np.int32)
     if overlap > 0:
-        labeled = da.overlap.trim_overlap(
-            labeled, depth=_depth, boundary="none"
+        # One fused pass: add the halo, run fn, trim it back off. map_overlap
+        # materialises only the halos it needs (no separate overlapped array)
+        # and keeps the task graph small. boundary="none" + trim recovers the
+        # original shape, so the boundary-slab scan reads clean tiles.
+        labeled = da.map_overlap(
+            active_fn,
+            image,
+            depth=_depth,
+            boundary="none",
+            trim=True,
+            dtype=np.int32,
+            meta=_meta,
         )
+    else:
+        labeled = image.map_blocks(active_fn, dtype=np.int32, meta=_meta)
-    # With no distributed client the threaded scheduler runs many tiles at
-    # once. For GPU that means several evals sharing one device → CUDA OOM.
-    # Pin to a single worker thread so evals run serially. A distributed
-    # client manages its own concurrency, so skip the override there.
+    # Bound staging concurrency to the machine so it can neither OOM nor pin
+    # every core:
+    #   - GPU → 1 eval at a time (no VRAM contention),
+    #   - CPU → as many tiles as fit RAM, leaving one core free.
+    # A distributed client manages its own concurrency, so skip the override.
     import dask as _dask
-    if _active is None and use_gpu:
-        _sched_ctx: Any = _dask.config.set(scheduler="threads", num_workers=1)
+    _tile_nbytes = int(np.prod(labeled.chunksize)) * labeled.dtype.itemsize
+    if _active is None:
+        _workers = (
+            max_workers
+            if max_workers is not None
+            else safe_worker_count(_tile_nbytes, use_gpu=use_gpu)
+        )
+        _workers = max(1, min(_workers, os.cpu_count() or 1))
+        logger.info("Staging with %d worker thread(s)", _workers)
+        _sched_ctx: Any = _dask.config.set(
+            scheduler="threads", num_workers=_workers
+        )
     else:
         _sched_ctx = _nullcontext()
@@ -347,24 +365,10 @@ def tile_process(
         _stage_to_zarr(labeled, stage_path, "staged", progress)
     labeled = da.from_zarr(stage_path, component="staged")
-    if skip_empty and _skip_thr is not None:
-        def _tile_max(block: np.ndarray) -> np.ndarray:
-            return np.full((1,) * block.ndim, int(block.max()), dtype=np.int32)
-        _tile_maxes = labeled.map_blocks(
-            _tile_max,
-            dtype=np.int32,
-            chunks=tuple(tuple(1 for _ in c) for c in labeled.chunks),
-        ).compute()
-        _n_skip = int((_tile_maxes == 0).sum())
-        logger.info(
-            "skip_empty: %d/%d tiles ran fn, %d skipped (max<=%.4g)",
-            int(_tile_maxes.size) - _n_skip,
-            int(_tile_maxes.size),
-            _n_skip,
-            _skip_thr,
-        )
+    # NB: no post-staging skip-count pass here — counting skipped tiles by
+    # re-reading the whole staged store off disk would double the I/O of the
+    # entire run just for a log line. Use estimate_empty_tiles() up front for
+    # that figure instead.
     def _cleanup_stage():
         if not keep_stage:
@@ -373,7 +377,9 @@ def tile_process(
             shutil.rmtree(stage_path, ignore_errors=True)
             logger.info("Removed stage store %s", stage_path)
-    _nw = min(4, os.cpu_count() or 1)
+    # Merge runs in worker processes (each holds one chunk + an mmap'd LUT);
+    # size it to RAM/CPU like staging, capped so we don't spawn a process storm.
+    _nw = max_workers or max(1, min(safe_worker_count(_tile_nbytes), 8))
     # Default: input is a .zarr store and no explicit write_to → labels go back
     # *into* the input store under the NGFF labels/<name>/ group with an auto

{patchworks-0.5.0 → patchworks-0.6.0}/tests/test_core.py RENAMED Viewed

@@ -247,3 +247,27 @@ def test_estimate_empty_tiles():
     assert info["n_tiles"] == 4
     assert info["n_occupied"] == 2
     assert info["empty_fraction"] == 0.5
+def test_safe_worker_count_bounds():
+    import os
+    from patchworks._chunks import safe_worker_count
+    # GPU → always serial (no VRAM contention)
+    assert safe_worker_count(10**6, use_gpu=True) == 1
+    # Absurdly large tile → memory-bound to 1
+    assert safe_worker_count(10**15) == 1
+    # Tiny tile → CPU-bound, leaves a core free, always >= 1
+    n = safe_worker_count(1024)
+    assert 1 <= n <= max(1, (os.cpu_count() or 1) - 1)
+def test_tile_process_max_workers():
+    import dask.array as da
+    from patchworks import tile_process
+    arr = da.from_array(_make_image((2, 32, 32)), chunks=(1, 32, 32))
+    result = tile_process(arr, _label_fn, max_workers=1).compute()
+    assert result.shape == (2, 32, 32)