PyPI - patchworks - Versions diffs - 0.8.0__tar.gz → 0.9.0__tar.gz - Mend

patchworks 0.8.0tar.gz → 0.9.0tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (51) hide show

{patchworks-0.8.0 → patchworks-0.9.0}/.github/workflows/lint.yml RENAMED Viewed

@@ -23,3 +23,13 @@ jobs:
         with:
           version: "0.15.18"
           args: "format --check"
+  markdownlint:
+    runs-on: ubuntu-latest
+    steps:
+      - uses: actions/checkout@v4
+      - name: markdownlint
+        uses: DavidAnson/markdownlint-cli2-action@v20
+        with:
+          globs: "**/*.md"

patchworks-0.9.0/.markdownlint-cli2.yaml ADDED Viewed

@@ -0,0 +1,16 @@
+# markdownlint-cli2 configuration — see https://github.com/DavidAnson/markdownlint
+config:
+  MD013: false # line length (tables, code and long prose)
+  MD033: false # inline HTML (badges, <img>)
+  MD041: false # first line need not be a top-level heading (badges)
+  MD024:
+    siblings_only: true # repeated headings OK across different sections
+  MD046: false # allow both indented and fenced code blocks
+  MD060: false # don't enforce table pipe spacing style
+globs:
+  - "**/*.md"
+ignores:
+  - "site"
+  - ".pixi"
+  - "PUBLIC"
+  - "**/node_modules"

{patchworks-0.8.0 → patchworks-0.9.0}/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.4
 Name: patchworks
-Version: 0.8.0
+Version: 0.9.0
 Summary: Tiled processing of arbitrarily large images with globally consistent labels
 Project-URL: Homepage, https://github.com/imcf/patchworks
 Project-URL: Issues, https://github.com/imcf/patchworks/issues
@@ -73,7 +73,7 @@ Description-Content-Type: text/markdown
 > Tiled processing of arbitrarily large images — any image, any function.
-```
+```text
 ┌──────┬──────┬──────┐     fn(tile) → labels      ┌──────┬──────┬──────┐
 │ tile │ tile │ tile │  ─────────────────────►    │  1   │  2   │  3   │
 ├──────┼──────┼──────┤                            ├──────┼──────┼──────┤
@@ -361,6 +361,7 @@ Full docs, guides and tutorials: **<https://imcf.one/patchworks/>**
 - dask[array], numpy, zarr, scipy
 Optional:
 - `psutil` — accurate RAM sizing for `tile_shape="auto"`
 - `nvidia-ml-py` — accurate GPU VRAM sizing
 - `tqdm` — progress bars

{patchworks-0.8.0 → patchworks-0.9.0}/README.md RENAMED Viewed

@@ -7,7 +7,7 @@
 > Tiled processing of arbitrarily large images — any image, any function.
-```
+```text
 ┌──────┬──────┬──────┐     fn(tile) → labels      ┌──────┬──────┬──────┐
 │ tile │ tile │ tile │  ─────────────────────►    │  1   │  2   │  3   │
 ├──────┼──────┼──────┤                            ├──────┼──────┼──────┤
@@ -295,6 +295,7 @@ Full docs, guides and tutorials: **<https://imcf.one/patchworks/>**
 - dask[array], numpy, zarr, scipy
 Optional:
 - `psutil` — accurate RAM sizing for `tile_shape="auto"`
 - `nvidia-ml-py` — accurate GPU VRAM sizing
 - `tqdm` — progress bars

{patchworks-0.8.0 → patchworks-0.9.0}/docs/examples/stardist.md RENAMED Viewed

@@ -47,18 +47,18 @@ tile_process(
     Load the model **outside** the `fn` closure. If you load it inside,
     it will be re-initialised (and potentially re-downloaded) once per tile.
-    For distributed execution, use `functools.partial` with a cached model:
+For distributed execution, use `functools.partial` with a cached model:
-    ```python
-    from functools import lru_cache
+```python
+from functools import lru_cache
-    @lru_cache(maxsize=1)
-    def _get_model():
-        return StarDist2D.from_pretrained("2D_versatile_fluo")
+@lru_cache(maxsize=1)
+def _get_model():
+    return StarDist2D.from_pretrained("2D_versatile_fluo")
-    def stardist_fn(tile):
-        model = _get_model()
-        ...
-    ```
+def stardist_fn(tile):
+    model = _get_model()
+    ...
+```

{patchworks-0.8.0 → patchworks-0.9.0}/docs/getting_started.md RENAMED Viewed

@@ -46,11 +46,11 @@ patchworks can be installed from PyPI on all operating systems, for Python ≥ 3
 ## The one function you need
-```python
-from patchworks import tile_process
+    ```python
+    from patchworks import tile_process
-result = tile_process(image, fn)
-```
+    result = tile_process(image, fn)
+    ```
 `tile_process(image, fn)` splits `image` into tiles, runs `fn` on each tile,
 and returns a globally consistent label array.
@@ -65,17 +65,17 @@ and returns a globally consistent label array.
 patchworks is method-agnostic. Your function receives a NumPy array (one tile)
 and must return an integer label array of the same shape:
-```python
-import numpy as np
+    ```python
+    import numpy as np
-def my_fn(tile: np.ndarray) -> np.ndarray:
-    from skimage.filters import threshold_otsu
-    from skimage.measure import label
+    def my_fn(tile: np.ndarray) -> np.ndarray:
+        from skimage.filters import threshold_otsu
+        from skimage.measure import label
-    binary = tile > threshold_otsu(tile)
-    return label(binary).astype("int32")
-```
+        binary = tile > threshold_otsu(tile)
+        return label(binary).astype("int32")
+    ```
 The function is called independently on every tile. patchworks ensures that
 objects spanning tile boundaries are merged into a single label.
@@ -155,14 +155,14 @@ objects spanning tile boundaries are merged into a single label.
 Methods like Cellpose and StarDist need spatial context at tile boundaries.
 Use `overlap` (in voxels) so boundary objects are fully visible:
-```python
-result = tile_process(
-    "image.zarr",
-    my_fn,
-    tile_shape=(1, 2048, 2048),
-    overlap=20,  # 20-voxel halo on every side
-)
-```
+    ```python
+    result = tile_process(
+        "image.zarr",
+        my_fn,
+        tile_shape=(1, 2048, 2048),
+        overlap=20,  # 20-voxel halo on every side
+    )
+    ```
 !!! info "How overlap works"
     Each tile is expanded by `overlap` voxels on every side before calling `fn`.
@@ -173,22 +173,22 @@ result = tile_process(
 ## Use Cellpose
-```python
-from patchworks import tile_process
-from patchworks.plugins.cellpose import cellpose_fn
-fn = cellpose_fn("cyto3", gpu=True, diameter=30)
-tile_process(
-    "image.zarr",
-    fn,
-    channel=0,
-    tile_shape=(1, 2048, 2048),
-    overlap=20,
-    write_to="labels.zarr",
-    progress=True,
-)
-```
+    ```python
+    from patchworks import tile_process
+    from patchworks.plugins.cellpose import cellpose_fn
+    fn = cellpose_fn("cyto3", gpu=True, diameter=30)
+    tile_process(
+        "image.zarr",
+        fn,
+        channel=0,
+        tile_shape=(1, 2048, 2048),
+        overlap=20,
+        write_to="labels.zarr",
+        progress=True,
+    )
+    ```
 See the [Cellpose 2-D example](examples/cellpose_2d.md) for the full workflow.

{patchworks-0.8.0 → patchworks-0.9.0}/docs/guide/gpu_distributed.md RENAMED Viewed

@@ -60,7 +60,7 @@ in the same process as the kernel. When your segmentation function holds the
 Python GIL (every PyTorch/CUDA `eval` does), the worker thread can't send
 heartbeats. The scheduler declares it dead, and the merge fails:
-```
+```python
 FutureCancelledError: lost dependencies
 ```

{patchworks-0.8.0 → patchworks-0.9.0}/docs/guide/merging.md RENAMED Viewed

@@ -9,7 +9,7 @@ even though it's the same cell.
 patchworks solves this with a zarr-native merge algorithm:
-```
+```text
 Tile A labels:        Tile B labels:        After merge:
 ┌────────────┐        ┌────────────┐        ┌──────────────────────┐
 │  3   1   2 │        │  1   4   2 │        │  3   1   2 │ 501 5 502│
@@ -32,7 +32,7 @@ Each tile's labels are written to a temporary zarr once. This is critical:
 without staging, any downstream operation that reads the label array re-runs
 your segmentation function. The merge internally reads labels multiple times.
-```
+```text
 tile_process calls fn once per tile → staged zarr
                                          │
                          merge reads from staged zarr (no fn calls)

{patchworks-0.8.0 → patchworks-0.9.0}/docs/guide/ome_zarr_napari.md RENAMED Viewed

@@ -76,6 +76,28 @@ and streaming the downsampled result out through dask with bounded chunks. The
 graph never chains level-on-level and no whole plane/volume is held in RAM, so
 terabyte images convert in bounded memory.
+### Sharding (fewer files)
+A big array becomes tens of thousands of tiny chunk files, which strain
+filesystems and object stores. Sharding packs many chunks into one **shard**
+file (zarr v3), cutting the file count ~100×:
+```python
+to_ome_zarr("scan.ims", "scan.zarr", shard=True)        # auto ~512 MB shards
+to_ome_zarr("scan.ims", "scan.zarr", shard=(1, 16, 2048, 2048))  # explicit
+```
+Default is `shard=False` for maximum reader compatibility — sharding is
+zarr-v3-only, so older tools may not read it (your zarr/napari stack does).
+A sharded write holds ~one shard per worker in RAM, so very large shards cost
+memory.
+### Progress
+All write steps show a dask progress bar **by default** (`progress=True`), so
+you can see how long a conversion will take. Pass `progress=False` to silence
+it.
 !!! note "Install the readers you need"
     `pip install "patchworks[bioio]"` pulls `bioio` plus the `bioio-bioformats`
     catch-all reader (needs a JVM). For speed, add native readers for your

{patchworks-0.8.0 → patchworks-0.9.0}/docs/guide/pitfalls.md RENAMED Viewed

@@ -30,7 +30,7 @@ single-GPU runs — patchworks pins it to 1 thread automatically).
 patchworks detects in-process clients at startup and raises immediately:
-```
+```python
 RuntimeError: Active Dask client uses an in-process worker (processes=False).
 This breaks the label merge when fn holds the GIL. Use a process-based
 cluster instead:

{patchworks-0.8.0 → patchworks-0.9.0}/docs/guide/skip_empty.md RENAMED Viewed

@@ -88,6 +88,6 @@ tile_process(
 After a `tile_process` run with `skip_empty=True`, the log reports exactly
 how many tiles ran your function:
-```
+```text
 INFO patchworks._core: skip_empty: 486/2200 tiles ran fn, 1714 skipped (max<=412.0)
 ```

{patchworks-0.8.0 → patchworks-0.9.0}/docs/guide/tiling.md RENAMED Viewed

@@ -13,6 +13,7 @@ peak RAM during segmentation is approximately one tile's worth of data.
 ## Choosing a tile size
 The right tile size depends on:
 - Your available RAM (or GPU VRAM)
 - The minimum context your segmentation method needs (objects should fit fully
   inside a tile, or you need overlap)
@@ -62,7 +63,7 @@ Methods that need spatial context (Cellpose, StarDist, U-Net) produce wrong
 results near tile edges: objects at the boundary are cut off. Overlap fixes this
 by expanding each tile by `overlap` voxels on every side.
-```
+```text
 No overlap:        With overlap=20:
 ┌──────────┐      ┌──────────────────┐
 │          │      │  ░░░░░░░░░░░░░░  │
@@ -86,4 +87,4 @@ No overlap:        With overlap=20:
     automatically clips the depth per axis, so z-tiles of size 1 (typical in
     2-D Cellpose mode) get `depth=0` in z even if you pass `overlap=20`.
-    Axes that are too small for the requested overlap simply get a smaller halo.
+  Axes that are too small for the requested overlap simply get a smaller halo.

{patchworks-0.8.0 → patchworks-0.9.0}/docs/index.md RENAMED Viewed

@@ -2,7 +2,7 @@
 **Tiled processing of arbitrarily large images — any image, any function.**
-```
+```text
 ┌──────┬──────┬──────┐                    ┌──────┬──────┬──────┐
 │      │      │      │   fn(tile) → IDs   │  1   │  2   │  3   │
 │      │      │      │  ───────────────►  │      │      │      │

{patchworks-0.8.0 → patchworks-0.9.0}/src/patchworks/_chunks.py RENAMED Viewed

@@ -49,6 +49,14 @@ _GPU_MEMORY_FALLBACK = 8 * 1024**3
 def _get_available_memory() -> int:
+    """Return available system RAM in bytes.
+    Returns
+    -------
+    int
+        Available memory via ``psutil``, or an 8 GiB fallback if it is not
+        installed.
+    """
     try:
         import psutil
@@ -103,7 +111,14 @@ def safe_worker_count(
 def _get_gpu_memory() -> int:
-    """Return free GPU VRAM in bytes. Falls back to 8 GiB default."""
+    """Return free GPU VRAM in bytes.
+    Returns
+    -------
+    int
+        Free VRAM of GPU 0 via ``nvidia-ml-py``, or an 8 GiB fallback if the
+        query fails.
+    """
     try:
         import pynvml

{patchworks-0.8.0 → patchworks-0.9.0}/src/patchworks/_cluster.py RENAMED Viewed

@@ -9,7 +9,14 @@ logger = logging.getLogger(__name__)
 def _distributed_client():
-    """Return the active dask.distributed Client, or None."""
+    """Return the active dask.distributed Client, or None.
+    Returns
+    -------
+    distributed.Client or None
+        The current client, or ``None`` if none is active / distributed is not
+        installed.
+    """
     try:
         from dask.distributed import get_client
@@ -19,12 +26,22 @@ def _distributed_client():
 def _client_is_in_process(client) -> bool:
-    """True if *client* runs its worker in this process (processes=False).
+    """Whether *client* runs its worker in this process (``processes=False``).
     An in-process worker shares the GIL. A long task that holds the GIL
     (e.g. a Cellpose/torch eval) starves the worker heartbeat, the scheduler
     declares it dead, and the P2P merge barrier drops its inputs →
     "FutureCancelledError: lost dependencies".
+    Parameters
+    ----------
+    client : distributed.Client
+        The client to inspect.
+    Returns
+    -------
+    bool
+        True if any worker address uses the ``inproc://`` transport.
     """
     try:
         for addr in client.scheduler_info().get("workers", {}):

{patchworks-0.8.0 → patchworks-0.9.0}/src/patchworks/_core.py RENAMED Viewed

@@ -23,7 +23,23 @@ logger = logging.getLogger(__name__)
 def _stage_to_zarr(
     arr: da.Array, path: str, component: str, show_progress: bool
 ) -> None:
-    """Write *arr* to zarr *path/component*, never loading it into RAM."""
+    """Write *arr* to zarr ``path/component``, never loading it into RAM.
+    Parameters
+    ----------
+    arr : da.Array
+        Array to materialise to disk.
+    path : str
+        Zarr store path.
+    component : str
+        Array name within the store.
+    show_progress : bool
+        Show a progress bar while computing.
+    Returns
+    -------
+    None
+    """
     import dask
     lazy_write = arr.to_zarr(
@@ -57,7 +73,7 @@ def tile_process(
     level: int = 0,
     use_gpu: bool = False,
     max_workers: int | None = None,
-    progress: bool = False,
+    progress: bool = True,
     write_to: Union[str, Path, None] = None,
     output_component: str = "labels",
     pyramid_levels: int = 5,
@@ -123,7 +139,8 @@ def tile_process(
         every core. Ignored when a distributed client is active (it manages its
         own concurrency).
     progress:
-        Show a progress bar during the tile-writing and relabel steps.
+        Show progress bars for staging, the label write and the pyramid
+        (default ``True``). Set ``False`` to silence them.
     write_to:
         Explicit output zarr store path. Overrides the default behaviour: the
         merged labels are written here as a single-resolution array named
@@ -297,6 +314,20 @@ def tile_process(
         _skip_thr = _auto_empty_threshold(image_for_threshold, channel, level)
     def active_fn(block, block_info=None):
+        """Run *fn* on one tile, or return zeros for an empty tile.
+        Parameters
+        ----------
+        block : np.ndarray
+            One image tile.
+        block_info : dict or None
+            Dask block metadata (used for logging the tile location).
+        Returns
+        -------
+        np.ndarray
+            Integer labels, or an all-zero tile when skipped.
+        """
         loc = block_info[0].get("chunk-location") if block_info else "?"
         if skip_empty and block.size and block.max() <= _skip_thr:
             if verbose:
@@ -398,6 +429,12 @@ def tile_process(
     # that figure instead.
     def _cleanup_stage():
+        """Delete the temporary stage store unless ``keep_stage`` is set.
+        Returns
+        -------
+        None
+        """
         if not keep_stage:
             import shutil
@@ -457,6 +494,7 @@ def tile_process(
         name=output_component,
         n_levels=pyramid_levels,
         downscale=pyramid_downscale,
+        progress=progress,
         overwrite=True,
     )
     shutil.rmtree(os.path.dirname(_merge_out), ignore_errors=True)

{patchworks-0.8.0 → patchworks-0.9.0}/src/patchworks/_io.py RENAMED Viewed

@@ -75,6 +75,16 @@ def _otsu_threshold(sample: np.ndarray) -> float:
     Operates on the full distribution including zeros — zeros are background
     pixels and must be included so Otsu can find the signal/background boundary.
+    Parameters
+    ----------
+    sample : np.ndarray
+        Flat intensity sample.
+    Returns
+    -------
+    float
+        The Otsu threshold, or ``0.0`` when the sample is degenerate.
     """
     try:
         from skimage.filters import threshold_otsu
@@ -89,7 +99,22 @@ def _otsu_threshold(sample: np.ndarray) -> float:
 def _auto_empty_threshold(
     image: da.Array, channel: int | None, level: int
 ) -> float:
-    """Pick an empty-tile threshold from a cheap bounded sample (Otsu)."""
+    """Pick an empty-tile threshold from a cheap bounded sample (Otsu).
+    Parameters
+    ----------
+    image : da.Array
+        Image to sample.
+    channel : int or None
+        Channel hint (kept for signature symmetry).
+    level : int
+        Pyramid level hint (kept for signature symmetry).
+    Returns
+    -------
+    float
+        Otsu threshold over a few small centred windows.
+    """
     n = image.ndim
     win = [min(64 if i >= n - 3 else s, s) for i, s in enumerate(image.shape)]
     win = [min(w, 256) if i >= n - 2 else w for i, w in enumerate(win)]

{patchworks-0.8.0 → patchworks-0.9.0}/src/patchworks/_merge.py RENAMED Viewed

@@ -55,6 +55,25 @@ _merge_out_comp: "str | None" = None
 def _init_worker(lut_path, staged_path, staged_comp, out_path, out_comp):
+    """Initialise a merge worker process with the shared paths and LUT.
+    Parameters
+    ----------
+    lut_path : str
+        Path to the relabel lookup table (loaded memory-mapped, read-only).
+    staged_path : str
+        Path to the staged-labels zarr store.
+    staged_comp : str
+        Component name within the staged store.
+    out_path : str
+        Path to the output zarr store.
+    out_comp : str
+        Component name within the output store.
+    Returns
+    -------
+    None
+    """
     global _merge_lut, _merge_lut_path, _merge_staged_path, _merge_staged_comp
     global _merge_out_path, _merge_out_comp
     _merge_lut = np.load(
@@ -68,6 +87,17 @@ def _init_worker(lut_path, staged_path, staged_comp, out_path, out_comp):
 def _relabel_chunk_worker(chunk_slice: tuple) -> None:
+    """Apply the relabel LUT to one chunk and write it to the output store.
+    Parameters
+    ----------
+    chunk_slice : tuple
+        The slice selecting this chunk in both stores.
+    Returns
+    -------
+    None
+    """
     src = zarr.open_group(_merge_staged_path, mode="r")[_merge_staged_comp]
     dst = zarr.open_group(_merge_out_path, mode="r+")[_merge_out_comp]
     block = np.asarray(src[chunk_slice], dtype=np.int64)
@@ -87,6 +117,20 @@ def _relabel_chunk_worker(chunk_slice: tuple) -> None:
 def _boundary_face_specs(
     shape: tuple[int, ...], chunk_shape: tuple[int, ...]
 ) -> list[tuple[int, int]]:
+    """Enumerate interior chunk boundaries to scan for touching labels.
+    Parameters
+    ----------
+    shape : tuple of int
+        Array shape.
+    chunk_shape : tuple of int
+        Chunk shape.
+    Returns
+    -------
+    list of tuple of int
+        ``(axis, position)`` pairs, one per interior chunk boundary.
+    """
     specs = []
     for ax, (s, cs) in enumerate(zip(shape, chunk_shape)):
         pos = cs
@@ -105,6 +149,21 @@ def _scan_touching_pairs(
     is bounded to one chunk (~200 MB). Reading the full face at once
     (slice(None) on face axes) would allocate face_area × 8 bytes in one shot —
     e.g. 37888 × 27392 × 8 = 8 GiB for a single z-face (OOM on real datasets).
+    Parameters
+    ----------
+    zarr_path : str
+        Path to the staged-labels zarr store.
+    component : str
+        Component name within the store.
+    chunk_shape : tuple of int
+        Chunk shape (sets the per-read column size).
+    Returns
+    -------
+    np.ndarray
+        ``(N, 2)`` int64 array of unique label pairs touching across a
+        boundary.
     """
     root = zarr.open_group(zarr_path, mode="r")
     arr = root[component]
@@ -135,7 +194,20 @@ def _scan_touching_pairs(
 def _build_relabel_lut(pairs: np.ndarray, max_label: int) -> np.ndarray:
-    """Touching-pairs → scipy connected components → relabeling LUT."""
+    """Build a relabel LUT from touching pairs via connected components.
+    Parameters
+    ----------
+    pairs : np.ndarray
+        ``(N, 2)`` array of touching label pairs.
+    max_label : int
+        Largest label id present.
+    Returns
+    -------
+    np.ndarray
+        Lookup table mapping each old label to its merged (component) id.
+    """
     if max_label > _LUT_WARN_THRESHOLD:
         logger.warning(
             "_build_relabel_lut: max_label=%d → LUT ~%.0f MB. "
@@ -168,6 +240,24 @@ def _build_relabel_lut(pairs: np.ndarray, max_label: int) -> np.ndarray:
 def _create_zarr_label_array(
     group: zarr.Group, name: str, shape: tuple, chunks: tuple
 ) -> zarr.Array:
+    """Create (replacing any existing) an int32 label array in *group*.
+    Parameters
+    ----------
+    group : zarr.Group
+        Parent group.
+    name : str
+        Array name (may be a nested path).
+    shape : tuple
+        Array shape.
+    chunks : tuple
+        Chunk shape.
+    Returns
+    -------
+    zarr.Array
+        The newly created array (works on zarr v2 and v3).
+    """
     if name in group:
         del group[name]
     if _ZARR_V3:
@@ -192,6 +282,25 @@ def zarr_native_merge(
     Scales to 2000+ chunks where the dask_image approach stalls (O(n_chunks²)
     graph). Reads *staged_path/staged_component*, merges touching cross-boundary
     labels, writes result to *out_path/out_component*. No dask task graph.
+    Parameters
+    ----------
+    staged_path : str
+        Path to the staged-labels zarr store.
+    staged_component : str
+        Component name within the staged store.
+    out_path : str
+        Path to the output zarr store.
+    out_component : str
+        Component name within the output store.
+    n_workers : int
+        Number of worker processes for the parallel relabel.
+    show_progress : bool
+        Show a progress bar over the relabel chunks.
+    Returns
+    -------
+    None
     """
     root = zarr.open_group(staged_path, mode="r")
     arr = root[staged_component]

patchworks 0.8.0__tar.gz → 0.9.0__tar.gz

patchworks 0.8.0tar.gz → 0.9.0tar.gz