PyPI - slide2vec - Versions diffs - 4.2.0__tar.gz → 4.4.0__tar.gz - Mend

slide2vec 4.2.0tar.gz → 4.4.0tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (86) hide show

{slide2vec-4.2.0 → slide2vec-4.4.0}/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.4
 Name: slide2vec
-Version: 4.2.0
+Version: 4.4.0
 Summary: Embedding of whole slide images with Foundation Models
 Author-email: Clément Grisi <clement.grisi@radboudumc.nl>
 License-Expression: Apache-2.0
@@ -15,7 +15,7 @@ Classifier: Programming Language :: Python :: 3.13
 Requires-Python: >=3.10
 Description-Content-Type: text/markdown
 License-File: LICENSE
-Requires-Dist: hs2p[asap,cucim,openslide,vips]>=3.2.1
+Requires-Dist: hs2p[asap,cucim,openslide,sam2,vips]>=4.0.1
 Requires-Dist: omegaconf
 Requires-Dist: matplotlib
 Requires-Dist: numpy<2
@@ -65,7 +65,7 @@ Requires-Dist: numpy<2; extra == "fm"
 Requires-Dist: pandas; extra == "fm"
 Requires-Dist: pillow; extra == "fm"
 Requires-Dist: rich; extra == "fm"
-Requires-Dist: hs2p[asap,cucim,openslide,vips]>=3.2.1; extra == "fm"
+Requires-Dist: hs2p[asap,cucim,openslide,sam2,vips]>=4.0.1; extra == "fm"
 Requires-Dist: wandb; extra == "fm"
 Requires-Dist: torch<2.8,>=2.3; extra == "fm"
 Requires-Dist: torchvision>=0.18.0; extra == "fm"
@@ -89,6 +89,12 @@ Requires-Dist: fairscale; extra == "fm"
 Requires-Dist: packaging==23.2; extra == "fm"
 Requires-Dist: ninja==1.11.1.1; extra == "fm"
 Requires-Dist: psutil<6; extra == "fm"
+Provides-Extra: docs
+Requires-Dist: sphinx>=8.1; extra == "docs"
+Requires-Dist: furo; extra == "docs"
+Requires-Dist: myst-parser; extra == "docs"
+Requires-Dist: sphinx-copybutton; extra == "docs"
+Requires-Dist: sphinx-autodoc-typehints; extra == "docs"
 Provides-Extra: testing
 Requires-Dist: pytest>=6.0; extra == "testing"
 Requires-Dist: pytest-cov>=2.0; extra == "testing"
@@ -101,9 +107,12 @@ Dynamic: license-file
 # slide2vec
 [![PyPI version](https://img.shields.io/pypi/v/slide2vec?label=pypi&logo=pypi&color=3776AB)](https://pypi.org/project/slide2vec/)
+[![Docs](https://img.shields.io/badge/docs-website-blue)](https://clemsgrs.github.io/slide2vec/)
 `slide2vec` is a Python package for efficient encoding of whole-slide images using publicly available foundation models. It builds on [`hs2p`](https://pypi.org/project/hs2p/) for fast preprocessing and exposes a focused surface around `Model`, `Pipeline`, and `ExecutionOptions`.
+Documentation site: [https://clemsgrs.github.io/slide2vec/](https://clemsgrs.github.io/slide2vec/)
 ## Installation
 ```shell
@@ -121,6 +130,8 @@ pip install git+https://github.com/Mahmoodlab/CONCH.git
 pip install git+https://github.com/prov-gigapath/prov-gigapath.git
 ```
+AtlasPatch-backed tissue segmentation is available through hs2p's `sam2` path in the bundled install.
 ## Python API
 ```python
@@ -137,6 +148,17 @@ x = embedded.x
 y = embedded.y
 ```
+Use `list_models()` when you want to inspect the shipped presets programmatically:
+```python
+from slide2vec import list_models
+all_models = list_models()
+tile_models = list_models("tile")
+slide_models = list_models("slide")
+patient_models = list_models("patient")
+```
 Use `Pipeline(...)` for manifest-driven batch processing when you want artifacts written to disk instead of only in-memory outputs:
 ```python
@@ -235,7 +257,8 @@ docker run --rm -it \
 ## Documentation
-- [`docs/cli.md`](docs/cli.md) for the config-driven CLI guide
+- [Documentation website](https://clemsgrs.github.io/slide2vec/) for the polished docs site
 - [`docs/python-api.md`](docs/python-api.md) for the detailed API reference
-- [`tutorials/api_walkthrough.ipynb`](tutorials/api_walkthrough.ipynb) for a notebook walkthrough of the API
+- [`docs/cli.md`](docs/cli.md) for the config-driven CLI guide
 - [`docs/models.md`](docs/models.md) for the full supported-model catalog
+- [`tutorials/api_walkthrough.ipynb`](tutorials/api_walkthrough.ipynb) for a notebook walkthrough of the API

{slide2vec-4.2.0 → slide2vec-4.4.0}/README.md RENAMED Viewed

@@ -1,9 +1,12 @@
 # slide2vec
 [![PyPI version](https://img.shields.io/pypi/v/slide2vec?label=pypi&logo=pypi&color=3776AB)](https://pypi.org/project/slide2vec/)
+[![Docs](https://img.shields.io/badge/docs-website-blue)](https://clemsgrs.github.io/slide2vec/)
 `slide2vec` is a Python package for efficient encoding of whole-slide images using publicly available foundation models. It builds on [`hs2p`](https://pypi.org/project/hs2p/) for fast preprocessing and exposes a focused surface around `Model`, `Pipeline`, and `ExecutionOptions`.
+Documentation site: [https://clemsgrs.github.io/slide2vec/](https://clemsgrs.github.io/slide2vec/)
 ## Installation
 ```shell
@@ -21,6 +24,8 @@ pip install git+https://github.com/Mahmoodlab/CONCH.git
 pip install git+https://github.com/prov-gigapath/prov-gigapath.git
 ```
+AtlasPatch-backed tissue segmentation is available through hs2p's `sam2` path in the bundled install.
 ## Python API
 ```python
@@ -37,6 +42,17 @@ x = embedded.x
 y = embedded.y
 ```
+Use `list_models()` when you want to inspect the shipped presets programmatically:
+```python
+from slide2vec import list_models
+all_models = list_models()
+tile_models = list_models("tile")
+slide_models = list_models("slide")
+patient_models = list_models("patient")
+```
 Use `Pipeline(...)` for manifest-driven batch processing when you want artifacts written to disk instead of only in-memory outputs:
 ```python
@@ -135,7 +151,8 @@ docker run --rm -it \
 ## Documentation
-- [`docs/cli.md`](docs/cli.md) for the config-driven CLI guide
+- [Documentation website](https://clemsgrs.github.io/slide2vec/) for the polished docs site
 - [`docs/python-api.md`](docs/python-api.md) for the detailed API reference
-- [`tutorials/api_walkthrough.ipynb`](tutorials/api_walkthrough.ipynb) for a notebook walkthrough of the API
+- [`docs/cli.md`](docs/cli.md) for the config-driven CLI guide
 - [`docs/models.md`](docs/models.md) for the full supported-model catalog
+- [`tutorials/api_walkthrough.ipynb`](tutorials/api_walkthrough.ipynb) for a notebook walkthrough of the API

{slide2vec-4.2.0 → slide2vec-4.4.0}/pyproject.toml RENAMED Viewed

@@ -4,7 +4,7 @@ build-backend = "setuptools.build_meta"
 [project]
 name = "slide2vec"
-version = "4.2.0"
+version = "4.4.0"
 description = "Embedding of whole slide images with Foundation Models"
 readme = "README.md"
 requires-python = ">=3.10"
@@ -21,7 +21,7 @@ classifiers = [
     "Programming Language :: Python :: 3.13",
 ]
 dependencies = [
-    "hs2p[asap,cucim,openslide,vips]>=3.2.1",
+    "hs2p[asap,cucim,openslide,sam2,vips]>=4.0.1",
     "omegaconf",
     "matplotlib",
     "numpy<2",
@@ -88,7 +88,7 @@ fm = [
     "pandas",
     "pillow",
     "rich",
-    "hs2p[asap,cucim,openslide,vips]>=3.2.1",
+    "hs2p[asap,cucim,openslide,sam2,vips]>=4.0.1",
     "wandb",
     "torch>=2.3,<2.8",
     "torchvision>=0.18.0",
@@ -113,6 +113,13 @@ fm = [
     "ninja==1.11.1.1",
     "psutil<6",
 ]
+docs = [
+    "sphinx>=8.1",
+    "furo",
+    "myst-parser",
+    "sphinx-copybutton",
+    "sphinx-autodoc-typehints",
+]
 testing = [
     "pytest>=6.0",
     "pytest-cov>=2.0",
@@ -157,7 +164,7 @@ no_implicit_reexport = true
 max-line-length = 160
 [tool.bumpver]
-current_version = "4.2.0"
+current_version = "4.4.0"
 version_pattern = "MAJOR.MINOR.PATCH"
 commit = false       # We do version bumping in CI, not as a commit
 tag = false          # Git tag already exists — we don't auto-tag

{slide2vec-4.2.0 → slide2vec-4.4.0}/slide2vec/__init__.py RENAMED Viewed

@@ -1,15 +1,26 @@
-from slide2vec.api import EmbeddedSlide, ExecutionOptions, Model, Pipeline, PreprocessingConfig, RunResult
+from slide2vec.api import (
+    EmbeddedPatient,
+    EmbeddedSlide,
+    ExecutionOptions,
+    Model,
+    Pipeline,
+    PreprocessingConfig,
+    RunResult,
+    list_models,
+)
 from slide2vec.artifacts import HierarchicalEmbeddingArtifact, SlideEmbeddingArtifact, TileEmbeddingArtifact
-__version__ = "4.2.0"
+__version__ = "4.4.0"
 __all__ = [
     "Model",
+    "list_models",
     "Pipeline",
     "PreprocessingConfig",
     "ExecutionOptions",
     "RunResult",
+    "EmbeddedPatient",
     "EmbeddedSlide",
     "SlideEmbeddingArtifact",
     "HierarchicalEmbeddingArtifact",

{slide2vec-4.2.0 → slide2vec-4.4.0}/slide2vec/api.py RENAMED Viewed

@@ -20,9 +20,9 @@ from slide2vec.encoders.registry import (
     resolve_preprocessing_defaults,
 )
 from slide2vec.encoders.validation import validate_encoder_config
-from slide2vec.model_settings import canonicalize_model_name, normalize_precision_name
+from slide2vec.runtime.model_settings import canonicalize_model_name, normalize_precision_name
 from slide2vec.progress import emit_progress
-from slide2vec.runtime_types import LoadedModel
+from slide2vec.runtime.types import LoadedModel
 from slide2vec.utils.utils import cpu_worker_limit, slurm_cpu_limit
 PathLike = str | Path
@@ -42,25 +42,53 @@ TilingResultsInput = Sequence[Any] | Mapping[str, Any]
 @dataclass(frozen=True, kw_only=True)
 class PreprocessingConfig:
+    """Configuration for slide tiling and preprocessing."""
+    #: Slide reading backend. ``"auto"`` tries cucim → openslide → vips in order.
+    #: Explicit choices: ``"cucim"``, ``"openslide"``, ``"vips"``, ``"asap"``.
     backend: str = "auto"
+    #: Target spacing in µm/px. Resolved from the model preset when ``None``.
     requested_spacing_um: float | None = None
+    #: Tile side length in pixels at *requested_spacing_um*.
+    #: Resolved from the model preset when ``None``.
     requested_tile_size_px: int | None = None
+    #: Parent region side length in pixels (hierarchical mode).
+    #: Auto-derived as ``requested_tile_size_px × region_tile_multiple`` when ``None``.
     requested_region_size_px: int | None = None
+    #: Region grid width/height in tiles (e.g. ``6`` → 6×6 = 36 tiles per region).
+    #: Enables hierarchical extraction when set; must be ≥ 2.
     region_tile_multiple: int | None = None
+    #: Relative spacing tolerance for pyramid level selection (default ``0.05``).
     tolerance: float = 0.05
+    #: Fractional tile overlap (``0.0`` = no overlap).
     overlap: float = 0.0
+    #: Minimum tissue fraction required to keep a tile (default ``0.01``).
     tissue_threshold: float = 0.01
+    #: Directory containing pre-extracted tile coordinates to reuse, skipping tiling.
     read_coordinates_from: Path | None = None
+    #: Directory containing pre-extracted tile images to skip the tiling step entirely.
     read_tiles_from: Path | None = None
+    #: Read and decode tiles on demand rather than pre-loading into memory.
     on_the_fly: bool = True
+    #: Decode tiles on the GPU via CuCIM / nvImageCodec when ``True``.
     gpu_decode: bool = False
+    #: Dynamically adjust batch size based on tile count.
     adaptive_batching: bool = False
+    #: Group adjacent tiles into supertile batches for faster I/O.
     use_supertiles: bool = True
+    #: JPEG decode library — ``"turbojpeg"`` (default) or ``"pillow"``.
     jpeg_backend: str = "turbojpeg"
+    #: Number of CuCIM reader threads.
     num_cucim_workers: int = 4
+    #: Skip slides already present in the output directory when ``True``.
     resume: bool = False
+    #: Forwarded to hs2p segmentation config. Supported keys: ``method``,
+    #: ``downsample``, ``sam2_device``. See :doc:`preprocessing` for details.
     segmentation: dict[str, Any] = field(default_factory=dict)
+    #: Forwarded to hs2p tile-filtering config.
     filtering: dict[str, Any] = field(default_factory=dict)
+    #: Controls whether hs2p writes mask and tiling preview images.
+    #: Keys: ``save_mask_preview``, ``save_tiling_preview``, ``downsample``.
     preview: dict[str, Any] = field(default_factory=dict)
     @classmethod
@@ -72,8 +100,17 @@ class PreprocessingConfig:
         gpu_decode = bool(tiling.gpu_decode)
         adaptive_batching = bool(tiling.adaptive_batching)
         preview_cfg = tiling.preview
-        preview_save = bool(preview_cfg.save)
-        preview_downsample = int(preview_cfg.downsample)
+        preview_save = bool(preview_cfg.save_mask_preview)
+        preview_tiling_save = bool(preview_cfg.save_tiling_preview)
+        preview_kwargs: dict[str, Any] = {
+            "save_mask_preview": preview_save,
+            "save_tiling_preview": preview_tiling_save,
+            "downsample": int(preview_cfg.downsample),
+        }
+        preview_kwargs["tissue_contour_color"] = tuple(
+            int(channel) for channel in preview_cfg.tissue_contour_color
+        )
+        preview_kwargs["mask_overlay_alpha"] = float(preview_cfg.mask_overlay_alpha)
         return cls(
             backend=tiling.backend,
             requested_spacing_um=float(tiling.params.requested_spacing_um),
@@ -104,11 +141,7 @@ class PreprocessingConfig:
             resume=bool(cfg.resume),
             segmentation=dict(tiling.seg_params),
             filtering=dict(tiling.filter_params),
-            preview={
-                "save_mask_preview": preview_save,
-                "save_tiling_preview": preview_save,
-                "downsample": preview_downsample,
-            },
+            preview=preview_kwargs,
         )
     def with_backend(self, backend: str) -> "PreprocessingConfig":
@@ -118,31 +151,44 @@ class PreprocessingConfig:
 @dataclass(frozen=True, kw_only=True)
 class ExecutionOptions:
+    """Runtime execution and output settings."""
+    #: Directory where artifacts are written. Required for :class:`Pipeline` runs.
     output_dir: Path | None = None
+    #: Tensor serialization format — ``"pt"`` (PyTorch, default) or ``"npz"`` (NumPy).
     output_format: str = "pt"
-    batch_size: int = 1
-    num_workers: int | None = None
+    #: Number of tiles per forward pass.
+    batch_size: int = 32
+    #: DataLoader worker count per GPU rank. ``None`` means auto
+    #: (capped by CPU / SLURM limit, then split across the resolved GPU count).
+    num_workers_per_gpu: int | None = None
+    #: Tiling worker count. ``None`` means auto (capped by CPU / SLURM limit).
     num_preprocessing_workers: int | None = None
+    #: Number of GPUs to use. ``None`` defaults to all available GPUs.
     num_gpus: int | None = None
+    #: Forward-pass dtype — ``"fp16"``, ``"bf16"``, ``"fp32"``,
+    #: or ``None`` (auto-determined from the model preset).
     precision: str | None = None
+    #: DataLoader prefetch queue depth per worker (default ``4``).
     prefetch_factor: int = 4
-    persistent_workers: bool = True
+    #: Persist tile embeddings to disk when running a slide-level model.
     save_tile_embeddings: bool = False
+    #: Persist slide embeddings to disk when running a patient-level model.
     save_slide_embeddings: bool = False
+    #: Persist encoder latent representations when available.
     save_latents: bool = False
     @classmethod
     def from_config(cls, cfg: Any, *, run_on_cpu: bool = False) -> "ExecutionOptions":
         configured_num_gpus = cfg.speed.num_gpus
         requested_precision = normalize_precision_name(cfg.speed.precision)
-        num_workers = cfg.speed.num_dataloader_workers
+        num_workers_per_gpu = cfg.speed.num_dataloader_workers
         prefetch_factor = int(cfg.speed.prefetch_factor_embedding)
-        persistent_workers = bool(cfg.speed.persistent_workers_embedding)
         return cls(
             output_dir=Path(cfg.output_dir),
             output_format="pt",
             batch_size=int(cfg.model.batch_size),
-            num_workers=int(num_workers) if num_workers is not None else None,
+            num_workers_per_gpu=int(num_workers_per_gpu) if num_workers_per_gpu is not None else None,
             num_preprocessing_workers=(
                 int(cfg.speed.num_preprocessing_workers)
                 if cfg.speed.num_preprocessing_workers is not None
@@ -151,7 +197,6 @@ class ExecutionOptions:
             num_gpus=1 if run_on_cpu else (int(configured_num_gpus) if configured_num_gpus is not None else None),
             precision="fp32" if run_on_cpu else requested_precision,
             prefetch_factor=prefetch_factor,
-            persistent_workers=persistent_workers,
             save_tile_embeddings=bool(cfg.model.save_tile_embeddings),
             save_slide_embeddings=bool(cfg.model.save_slide_embeddings),
             save_latents=bool(cfg.model.save_latents),
@@ -174,23 +219,25 @@ class ExecutionOptions:
         object.__setattr__(self, "num_preprocessing_workers", capped_num_preprocessing_workers)
         logger = logging.getLogger(__name__)
         cap_source = f"slurm_cpu_limit={slurm_limit}" if slurm_limit is not None else f"cpu_count={cpu_count}"
-        resolved_num_workers = self.resolved_num_workers()
-        num_workers_label = (
+        resolved_num_workers = self.resolved_num_workers_per_gpu()
+        num_workers_per_gpu_label = (
             f"{resolved_num_workers} (requested=auto)"
-            if self.num_workers is None
+            if self.num_workers_per_gpu is None
             else str(resolved_num_workers)
         )
         logger.info(
-            "ExecutionOptions: num_workers=%s, num_preprocessing_workers=%d "
+            "ExecutionOptions: num_workers_per_gpu=%s, num_preprocessing_workers=%d "
             "(preprocessing cap=%d via %s)",
-            num_workers_label,
+            num_workers_per_gpu_label,
             capped_num_preprocessing_workers,
             cap,
             cap_source,
         )
-    def resolved_num_workers(self) -> int:
-        return cpu_worker_limit() if self.num_workers is None else int(self.num_workers)
+    def resolved_num_workers_per_gpu(self) -> int:
+        if self.num_workers_per_gpu is not None:
+            return self.num_workers_per_gpu
+        return max(1, cpu_worker_limit() // self.num_gpus)
     def with_output_dir(self, output_dir: PathLike | None) -> "ExecutionOptions":
         if output_dir is None:
@@ -200,33 +247,60 @@ class ExecutionOptions:
 @dataclass(frozen=True, kw_only=True)
 class RunResult:
+    """Return value of :meth:`Pipeline.run`."""
+    #: Tile embedding artifacts written to disk.
     tile_artifacts: list[TileEmbeddingArtifact]
+    #: Hierarchical embedding artifacts; empty when hierarchical mode is disabled.
     hierarchical_artifacts: list[HierarchicalEmbeddingArtifact]
+    #: Slide embedding artifacts written to disk.
     slide_artifacts: list[SlideEmbeddingArtifact]
+    #: Patient embedding artifacts; empty when no patient-level model is used.
     patient_artifacts: list[PatientEmbeddingArtifact] = field(default_factory=list)
+    #: Path to ``process_list.csv``, which tracks processing status per sample.
     process_list_path: Path | None = None
 @dataclass(frozen=True, kw_only=True)
 class EmbeddedPatient:
+    """In-memory result of embedding a single patient."""
+    #: Unique patient identifier.
     patient_id: str
-    patient_embedding: Any  # torch.Tensor [D]
-    slide_embeddings: dict[str, Any]  # {sample_id: torch.Tensor [D]}
+    #: Aggregated patient embedding — :class:`torch.Tensor` of shape ``(D,)``.
+    patient_embedding: Any
+    #: Slide-level embeddings keyed by ``sample_id`` — each a :class:`torch.Tensor` of shape ``(D,)``.
+    slide_embeddings: dict[str, Any]
 @dataclass(frozen=True, kw_only=True)
 class EmbeddedSlide:
+    """In-memory result of embedding a single slide."""
+    #: Unique slide identifier.
     sample_id: str
+    #: Tile embeddings — :class:`torch.Tensor` of shape ``(N, D)``.
     tile_embeddings: Any
+    #: Slide-level embedding — :class:`torch.Tensor` of shape ``(D,)`` for
+    #: slide-level encoders; ``None`` for tile-only encoders.
     slide_embedding: Any | None
+    #: x coordinate (pixels at level 0) of each tile's top-left corner — array of shape ``(N,)``.
     x: Any
+    #: y coordinate (pixels at level 0) of each tile's top-left corner — array of shape ``(N,)``.
     y: Any
+    #: Tile side length in pixels at level 0.
     tile_size_lv0: int
+    #: Path to the source slide file.
     image_path: Path
+    #: Path to the tissue mask used for tiling, if any.
     mask_path: Path | None = None
+    #: Number of tiles extracted from the slide.
     num_tiles: int | None = None
+    #: Path to the mask preview image, if generated.
     mask_preview_path: Path | None = None
+    #: Path to the tiling preview image, if generated.
     tiling_preview_path: Path | None = None
+    #: Encoder latent representations when available; ``None`` otherwise.
     latents: Any | None = None
@@ -444,6 +518,27 @@ class Model:
         return self._backend
+def list_models(level: str | None = None) -> list[str]:
+    """Return the available preset model names in a stable order.
+    Args:
+        level: Optional model level filter. Supported values are ``"tile"``,
+            ``"slide"``, and ``"patient"``.
+    """
+    if level is None:
+        return sorted(encoder_registry.names())
+    normalized_level = str(level).strip().lower()
+    if normalized_level not in {"tile", "slide", "patient"}:
+        raise ValueError("list_models(level=...) must be one of: tile, slide, patient")
+    return sorted(
+        name
+        for name in encoder_registry.names()
+        if encoder_registry.info(name)["level"] == normalized_level
+    )
 class Pipeline:
     def __init__(
         self,

slide2vec-4.4.0/slide2vec/configs/__init__.py ADDED Viewed

@@ -0,0 +1,4 @@
+from slide2vec.configs.resources import load_config
+default_config = load_config("default")

{slide2vec-4.2.0 → slide2vec-4.4.0}/slide2vec/configs/default.yaml RENAMED Viewed

@@ -38,12 +38,15 @@ tiling:
     # downsample controls which pyramid level is read for tissue segmentation.
     # Larger values are faster and use less memory; smaller values can improve mask precision.
     downsample: 64 # find the closest downsample in the slide for tissue segmentation
-    sthresh: 8 # segmentation threshold (positive integer, using a higher threshold leads to less foreground and more background detection) (not used when use_otsu=True)
+    sthresh: 8 # segmentation threshold (positive integer, using a higher threshold leads to less foreground and more background detection) (not used when method="otsu")
     sthresh_up: 255 # upper threshold value for scaling the binary mask
     mthresh: 7 # median filter size (positive, odd integer)
     close: 4 # additional morphological closing to apply following initial thresholding (positive integer)
-    use_otsu: false # use otsu's method instead of simple binary thresholding
-    use_hsv: true # use HSV thresholding instead of simple binary thresholding
+    method: # tissue segmentation method: "hsv", "otsu", "threshold", or "sam2"; ignored when precomputed tissue masks are provided
+    sam2_checkpoint_path: # optional when method="sam2"; if empty, hs2p downloads the default AtlasPatch checkpoint from Hugging Face
+    sam2_config_path: # optional local override for the SAM2 model config; if empty, hs2p downloads the default AtlasPatch config from Hugging Face
+    sam2_device: "cpu" # device for SAM2 inference, e.g. "cpu", "cuda", or "cuda:0"
+    sam2_num_workers: # optional cap on concurrent SAM2 mask-resolution workers; set to 1 to serialize GPU inference and avoid CUDA OOMs
   filter_params:
     ref_tile_size: ${tiling.params.requested_tile_size_px} # reference tile size at the target spacing
     a_t: 4 # area filter threshold for tissue (positive integer, the minimum size of detected foreground contours to consider, relative to the reference tile size ref_tile_size, e.g. a value 10 means only detected foreground contours of size greater than 10 [ref_tile_size, ref_tile_size] tiles at spacing tiling.params.requested_spacing_um will be kept)
@@ -60,19 +63,19 @@ tiling:
     blur_threshold: 50.0 # minimum blur score (higher is sharper)
     qc_spacing_um: 2.0 # spacing at which pixel-based QC is evaluated
   preview:
-    save: true # save preview images of slide tiling and mask overlays
+    save_mask_preview: true # save preview images of mask overlays
+    save_tiling_preview: true # save preview images of tile layouts
     downsample: 32 # downsample to use for preview rendering
-    mask_overlay_color: [157, 219, 129] # RGB color used for tissue overlays in batch mask previews
+    tissue_contour_color: [157, 219, 129] # RGB color used for tissue contours in batch mask previews
     mask_overlay_alpha: 0.5 # alpha used for tissue overlays in batch mask previews
 speed:
   precision: # model inference precision ["fp32", "fp16", "bf16"]; if not set, determined automatically based on model recommendations
-  num_dataloader_workers: # number of DataLoader worker processes for reading tiles during embedding; defaults to auto (job CPU budget, except cuCIM on-the-fly uses cpu_budget // speed.num_cucim_workers)
+  num_dataloader_workers: # number of DataLoader worker processes per GPU rank for reading tiles during embedding; defaults to auto (job CPU budget split across GPUs, except cuCIM on-the-fly uses per-GPU budget // speed.num_cucim_workers)
   num_gpus: # number of GPUs to use for feature extraction; defaults to all available GPUs
   num_preprocessing_workers: # number of workers for hs2p tiling (WSI reading, JPEG encoding, tar writing); defaults to the runtime CPU budget capped at 64
   num_cucim_workers: 4 # number of internal cucim threads per read_region call (embedding path, on-the-fly only); DataLoader workers are auto-set to cpu_count // num_cucim_workers
   prefetch_factor_embedding: 4 # prefetch factor for tile embedding dataloaders
-  persistent_workers_embedding: true # keep DataLoader workers alive across epochs/batches
 wandb:
   enable: false

{slide2vec-4.2.0/slide2vec → slide2vec-4.4.0/slide2vec/configs}/resources.py RENAMED Viewed

@@ -1,11 +1,10 @@
-from importlib.resources import as_file, files
+from contextlib import contextmanager
 from pathlib import Path
 from typing import Iterator
-from contextlib import contextmanager
 def config_resource(*parts: str):
-    path = files("slide2vec").joinpath("configs")
+    path = Path(__file__).resolve().parent
     for part in parts:
         path = path.joinpath(part)
     return path.with_suffix(".yaml")
@@ -21,6 +20,4 @@ def load_config(*parts: str):
 @contextmanager
 def config_path(*parts: str) -> Iterator[Path]:
-    resource = config_resource(*parts)
-    with as_file(resource) as resolved:
-        yield resolved
+    yield config_resource(*parts)

{slide2vec-4.2.0 → slide2vec-4.4.0}/slide2vec/distributed/direct_embed_worker.py RENAMED Viewed

@@ -26,11 +26,10 @@ def main(argv=None) -> int:
         _compute_tile_embeddings_for_slide,
         _is_hierarchical_preprocessing,
         _resolve_hierarchical_geometry,
-        deserialize_execution,
-        deserialize_preprocessing,
         load_successful_tiled_slides,
     )
     from slide2vec.progress import JsonlProgressReporter, activate_progress_reporter
+    from slide2vec.runtime.serialization import deserialize_execution, deserialize_preprocessing
     parser = get_args_parser(add_help=True)
     args = parser.parse_args(argv)
@@ -49,6 +48,7 @@ def main(argv=None) -> int:
             model_spec["name"],
             device=f"cuda:{local_rank}",
             output_variant=model_spec.get("output_variant"),
+            allow_non_recommended_settings=bool(model_spec["allow_non_recommended_settings"]),
         )
         preprocessing = deserialize_preprocessing(request["preprocessing"])
         execution = deserialize_execution(request["execution"])
@@ -119,20 +119,24 @@ def main(argv=None) -> int:
                 return 0
             assigned_slides = [paired_by_sample[sample_id][0] for sample_id in assigned_ids]
             assigned_tiling_results = [paired_by_sample[sample_id][1] for sample_id in assigned_ids]
-            embedded_slides = _compute_embedded_slides(
-                model,
-                assigned_slides,
-                assigned_tiling_results,
-                preprocessing=preprocessing,
-                execution=execution,
-            )
-            for embedded_slide in embedded_slides:
+            def _persist_embedded_slide(slide, tiling_result, embedded_slide) -> None:
                 payload = {
                     "tile_embeddings": _to_cpu_payload(embedded_slide.tile_embeddings),
                     "slide_embedding": _to_cpu_payload(embedded_slide.slide_embedding),
                     "latents": _to_cpu_payload(embedded_slide.latents),
                 }
                 torch.save(payload, coordination_dir / f"{embedded_slide.sample_id}.embedded.pt")
+            _compute_embedded_slides(
+                model,
+                assigned_slides,
+                assigned_tiling_results,
+                preprocessing=preprocessing,
+                execution=execution,
+                on_embedded_slide=_persist_embedded_slide,
+                collect_results=False,
+            )
             return 0
     finally:
         if dist.is_available() and dist.is_initialized():

slide2vec 4.2.0__tar.gz → 4.4.0__tar.gz

slide2vec 4.2.0tar.gz → 4.4.0tar.gz