PyPI - slide2vec - Versions diffs - 2.0.0__tar.gz → 2.0.1__tar.gz - Mend

slide2vec 2.0.0tar.gz → 2.0.1tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (43) hide show

{slide2vec-2.0.0/slide2vec.egg-info → slide2vec-2.0.1}/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.4
 Name: slide2vec
-Version: 2.0.0
+Version: 2.0.1
 Summary: Embedding of whole slide images with Foundation Models
 Home-page: https://github.com/clemsgrs/slide2vec
 Author: Clément Grisi
@@ -58,6 +58,37 @@ Dynamic: project-url
 [![Docker Version](https://img.shields.io/docker/v/waticlems/slide2vec?sort=semver&label=docker&logo=docker&color=2496ED)](https://hub.docker.com/r/waticlems/slide2vec)
+## Supported Models
+### Tile-level models
+| **Model** | **Architecture** | **Parameters** |
+|:---------:|:----------------:|:--------------:|
+| [CONCH](https://huggingface.co/MahmoodLab/conch) | ViT-B/16 | 86M |
+| [H0-mini](https://huggingface.co/bioptimus/H0-mini) | ViT-B/16 | 86M |
+| [Hibou-B](https://huggingface.co/histai/hibou-b) | ViT-B/16 | 86M |
+| [Hibou-L](https://huggingface.co/histai/hibou-L) | ViT-L/16 | 307M |
+| [MUSK](https://huggingface.co/xiangjx/musk) | ViT-L/16 | 307M |
+| [Phikon-v2](https://huggingface.co/owkin/phikon-v2) | ViT-L/16 | 307M |
+| [UNI](https://huggingface.co/MahmoodLab/UNI) | ViT-L/16 | 307M |
+| [Virchow](https://huggingface.co/paige-ai/Virchow) | ViT-H/14 | 632M |
+| [Virchow2](https://huggingface.co/paige-ai/Virchow2) | ViT-H/14 | 632M |
+| [MidNight12k](https://huggingface.co/kaiko-ai/midnight) | ViT-G/14 | 1.1B |
+| [UNI2](https://huggingface.co/MahmoodLab/UNI2-h) | ViT-G/14 | 1.1B |
+| [Prov-GigaPath](https://huggingface.co/prov-gigapath/prov-gigapath) | ViT-G/14 | 1.1B |
+| [H-optimus-0](https://huggingface.co/bioptimus/H-optimus-0) | ViT-G/14 | 1.1B |
+| [H-optimus-1](https://huggingface.co/bioptimus/H-optimus-1) | ViT-G/14 | 1.1B |
+| [Kaiko](https://github.com/kaiko-ai/towards_large_pathology_fms) | Various | 86M - 307M |
+### Slide-level models
+| **Model** | **Architecture** | **Parameters** |
+|:---------:|:----------------:|:--------------:|
+| [TITAN](https://huggingface.co/MahmoodLab/TITAN) | Transformer | 49M |
+| [Prov-GigaPath](https://huggingface.co/prov-gigapath/prov-gigapath) | Transformer (LongNet) | 87M |
+| [PRISM](https://huggingface.co/paige-ai/PRISM) | Perceiver Resampler | 99M |
 ## 🛠️ Installation
 System requirements: Linux-based OS (e.g., Ubuntu 22.04) with Python 3.10+ and Docker installed.
@@ -77,7 +108,7 @@ Replace `/path/to/your/data` with your local data directory.
 Alternatively, you can install `slide2vec` via pip:
 ```shell
-pip install slide2vec
+pip install slide2vechel
 ```
 ## 🚀 Extract features
@@ -93,10 +124,11 @@ pip install slide2vec
 2. Create a configuration file
-   A good starting point is the default configuration file `slide2vec/configs/default.yaml` where parameters are documented.<br>
-   We've also added default configuration files for each of the foundation models currently supported:
-   - tile-level: `uni`, `uni2`, `virchow`, `virchow2`, `prov-gigapath`, `h-optimus-0`, `h-optimus-1`, `h0-mini`, `conch`, `musk`, `phikonv2`, `hibou-b`, `hibou-L`, [`kaiko`](https://github.com/kaiko-ai/towards_large_pathology_fms)
-   - slide-level: `prov-gigapath`, `titan`, `prism`
+   A good starting point are the default configuration files where parameters are documented:<br>
+   - for preprocessing options: `slide2vec/configs/default_tiling.yaml`
+   - for model options: `slide2vec/configs/default_model_.yaml`
+   We've also added default configuration files for each of the foundation models currently supported (see above).
 3. Kick off distributed feature extraction

slide2vec-2.0.1/README.md ADDED Viewed

@@ -0,0 +1,84 @@
+# slide2vec
+[![PyPI version](https://img.shields.io/pypi/v/slide2vec?label=pypi&logo=pypi&color=3776AB)](https://pypi.org/project/slide2vec/)
+[![Docker Version](https://img.shields.io/docker/v/waticlems/slide2vec?sort=semver&label=docker&logo=docker&color=2496ED)](https://hub.docker.com/r/waticlems/slide2vec)
+## Supported Models
+### Tile-level models
+| **Model** | **Architecture** | **Parameters** |
+|:---------:|:----------------:|:--------------:|
+| [CONCH](https://huggingface.co/MahmoodLab/conch) | ViT-B/16 | 86M |
+| [H0-mini](https://huggingface.co/bioptimus/H0-mini) | ViT-B/16 | 86M |
+| [Hibou-B](https://huggingface.co/histai/hibou-b) | ViT-B/16 | 86M |
+| [Hibou-L](https://huggingface.co/histai/hibou-L) | ViT-L/16 | 307M |
+| [MUSK](https://huggingface.co/xiangjx/musk) | ViT-L/16 | 307M |
+| [Phikon-v2](https://huggingface.co/owkin/phikon-v2) | ViT-L/16 | 307M |
+| [UNI](https://huggingface.co/MahmoodLab/UNI) | ViT-L/16 | 307M |
+| [Virchow](https://huggingface.co/paige-ai/Virchow) | ViT-H/14 | 632M |
+| [Virchow2](https://huggingface.co/paige-ai/Virchow2) | ViT-H/14 | 632M |
+| [MidNight12k](https://huggingface.co/kaiko-ai/midnight) | ViT-G/14 | 1.1B |
+| [UNI2](https://huggingface.co/MahmoodLab/UNI2-h) | ViT-G/14 | 1.1B |
+| [Prov-GigaPath](https://huggingface.co/prov-gigapath/prov-gigapath) | ViT-G/14 | 1.1B |
+| [H-optimus-0](https://huggingface.co/bioptimus/H-optimus-0) | ViT-G/14 | 1.1B |
+| [H-optimus-1](https://huggingface.co/bioptimus/H-optimus-1) | ViT-G/14 | 1.1B |
+| [Kaiko](https://github.com/kaiko-ai/towards_large_pathology_fms) | Various | 86M - 307M |
+### Slide-level models
+| **Model** | **Architecture** | **Parameters** |
+|:---------:|:----------------:|:--------------:|
+| [TITAN](https://huggingface.co/MahmoodLab/TITAN) | Transformer | 49M |
+| [Prov-GigaPath](https://huggingface.co/prov-gigapath/prov-gigapath) | Transformer (LongNet) | 87M |
+| [PRISM](https://huggingface.co/paige-ai/PRISM) | Perceiver Resampler | 99M |
+## 🛠️ Installation
+System requirements: Linux-based OS (e.g., Ubuntu 22.04) with Python 3.10+ and Docker installed.
+We recommend running the script inside a container using the latest `slide2vec` image from Docker Hub:
+```shell
+docker pull waticlems/slide2vec:latest
+docker run --rm -it \
+    -v /path/to/your/data:/data \
+    -e HF_TOKEN=<your-huggingface-api-token> \
+    waticlems/slide2vec:latest
+```
+Replace `/path/to/your/data` with your local data directory.
+Alternatively, you can install `slide2vec` via pip:
+```shell
+pip install slide2vechel
+```
+## 🚀 Extract features
+1. Create a `.csv` file with slide paths. Optionally, you can provide paths to pre-computed tissue masks.
+    ```csv
+    wsi_path,mask_path
+    /path/to/slide1.tif,/path/to/mask1.tif
+    /path/to/slide2.tif,/path/to/mask2.tif
+    ...
+    ```
+2. Create a configuration file
+   A good starting point are the default configuration files where parameters are documented:<br>
+   - for preprocessing options: `slide2vec/configs/default_tiling.yaml`
+   - for model options: `slide2vec/configs/default_model_.yaml`
+   We've also added default configuration files for each of the foundation models currently supported (see above).
+3. Kick off distributed feature extraction
+    ```shell
+    python3 -m slide2vec.main --config-file </path/to/config.yaml>
+    ```

{slide2vec-2.0.0 → slide2vec-2.0.1}/pyproject.toml RENAMED Viewed

@@ -23,7 +23,7 @@ warn_unused_configs = true
 no_implicit_reexport = true
 [tool.bumpver]
-current_version = "2.0.0"
+current_version = "2.0.1"
 version_pattern = "MAJOR.MINOR.PATCH"
 commit = false       # We do version bumping in CI, not as a commit
 tag = false          # Git tag already exists — we don't auto-tag

{slide2vec-2.0.0 → slide2vec-2.0.1}/setup.cfg RENAMED Viewed

@@ -1,6 +1,6 @@
 [metadata]
 name = slide2vec
-version = 2.0.0
+version = 2.0.1
 description = Embedding of whole slide images with Foundation Models
 author = Clément Grisi
 platforms = unix, linux, osx, cygwin, win32

slide2vec-2.0.1/slide2vec/__init__.py ADDED Viewed

@@ -0,0 +1,6 @@
+__version__ = "2.0.1"
+import sys
+import os
+sys.path.append(os.path.join(os.path.dirname(__file__), "hs2p"))

slide2vec-2.0.1/slide2vec/configs/__init__.py ADDED Viewed

@@ -0,0 +1,20 @@
+import pathlib
+from omegaconf import OmegaConf
+def load_config(config_name: str):
+    config_filename = config_name + ".yaml"
+    return OmegaConf.load(pathlib.Path(__file__).parent.resolve() / config_filename)
+default_tiling_config = load_config("default_tiling")
+default_model_config = load_config("default_model")
+def load_and_merge_config(config_name: str):
+    default_tiling_config = OmegaConf.create(default_tiling_config)
+    default_model_config = OmegaConf.create(default_model_config)
+    default_config = OmegaConf.merge(default_tiling_config, default_model_config)
+    loaded_config = load_config(config_name)
+    return OmegaConf.merge(default_config, loaded_config)

slide2vec-2.0.1/slide2vec/data/dataset.py ADDED Viewed

@@ -0,0 +1,127 @@
+import cv2
+import torch
+import numpy as np
+import wholeslidedata as wsd
+from transformers.image_processing_utils import BaseImageProcessor
+from PIL import Image
+from pathlib import Path
+from typing import Callable
+from slide2vec.hs2p.hs2p.wsi import WholeSlideImage, SegmentationParameters, SamplingParameters, FilterParameters
+from slide2vec.hs2p.hs2p.wsi.utils import HasEnoughTissue
+class TileDataset(torch.utils.data.Dataset):
+    def __init__(
+        self,
+        wsi_path: Path,
+        mask_path: Path,
+        coordinates_dir: Path,
+        target_spacing: float,
+        tolerance: float,
+        backend: str,
+        segment_params: SegmentationParameters | None = None,
+        sampling_params: SamplingParameters | None = None,
+        filter_params: FilterParameters | None = None,
+        transforms: BaseImageProcessor | Callable | None = None,
+        restrict_to_tissue: bool = False,
+    ):
+        self.path = wsi_path
+        self.mask_path = mask_path
+        self.target_spacing = target_spacing
+        self.backend = backend
+        self.name = wsi_path.stem.replace(" ", "_")
+        self.load_coordinates(coordinates_dir)
+        self.transforms = transforms
+        self.restrict_to_tissue = restrict_to_tissue
+        if restrict_to_tissue:
+            _wsi = WholeSlideImage(
+                path=self.path,
+                mask_path=self.mask_path,
+                backend=self.backend,
+                segment=self.mask_path is None,
+                segment_params=segment_params,
+                sampling_params=sampling_params,
+            )
+            contours, holes = _wsi.detect_contours(
+                target_spacing=target_spacing,
+                tolerance=tolerance,
+                filter_params=filter_params,
+            )
+            scale = _wsi.level_downsamples[_wsi.seg_level]
+            self.contours = _wsi.scaleContourDim(contours, (1.0 / scale[0], 1.0 / scale[1]))
+            self.holes = _wsi.scaleHolesDim(holes, (1.0 / scale[0], 1.0 / scale[1]))
+            self.tissue_mask = _wsi.annotation_mask["tissue"]
+            self.seg_spacing = _wsi.get_level_spacing(_wsi.seg_level)
+            self.spacing_at_level_0 = _wsi.get_level_spacing(0)
+    def load_coordinates(self, coordinates_dir):
+        coordinates = np.load(Path(coordinates_dir, f"{self.name}.npy"), allow_pickle=True)
+        self.x = coordinates["x"]
+        self.y = coordinates["y"]
+        self.coordinates = (np.array([self.x, self.y]).T).astype(int)
+        self.scaled_coordinates = self.scale_coordinates()
+        self.contour_index = coordinates["contour_index"]
+        self.target_tile_size = coordinates["target_tile_size"]
+        self.tile_level = coordinates["tile_level"]
+        self.resize_factor = coordinates["resize_factor"]
+        self.tile_size_resized = coordinates["tile_size_resized"]
+        self.tile_size_lv0 = coordinates["tile_size_lv0"][0]
+    def scale_coordinates(self):
+        # coordinates are defined w.r.t. level 0
+        # i need to scale them to target_spacing
+        wsi = wsd.WholeSlideImage(self.path, backend=self.backend)
+        min_spacing = wsi.spacings[0]
+        scale = min_spacing / self.target_spacing
+        # create a [N, 2] array with x and y coordinates
+        scaled_coordinates = (self.coordinates * scale).astype(int)
+        return scaled_coordinates
+    def __len__(self):
+        return len(self.x)
+    def __getitem__(self, idx):
+        wsi = wsd.WholeSlideImage(
+            self.path, backend=self.backend
+        )  # cannot be defined in __init__ because of multiprocessing
+        tile_level = self.tile_level[idx]
+        tile_spacing = wsi.spacings[tile_level]
+        tile_arr = wsi.get_patch(
+            self.x[idx],
+            self.y[idx],
+            self.tile_size_resized[idx],
+            self.tile_size_resized[idx],
+            spacing=tile_spacing,
+            center=False,
+        )
+        if self.restrict_to_tissue:
+            contour_idx = self.contour_index[idx]
+            contour = self.contours[contour_idx]
+            holes = self.holes[contour_idx]
+            tissue_checker = HasEnoughTissue(
+                contour=contour,
+                contour_holes=holes,
+                tissue_mask=self.tissue_mask,
+                tile_size=self.target_tile_size[idx],
+                tile_spacing=tile_spacing,
+                resize_factor=self.resize_factor[idx],
+                seg_spacing=self.seg_spacing,
+                spacing_at_level_0=self.spacing_at_level_0,
+            )
+            tissue_mask = tissue_checker.get_tile_mask(self.x[idx], self.y[idx])
+            # ensure mask is the same size as the tile
+            assert tissue_mask.shape[:2] == tile_arr.shape[:2], "Mask and tile shapes do not match"
+            # apply mask
+            tile_arr = cv2.bitwise_and(tile_arr, tile_arr, mask=tissue_mask)
+        tile = Image.fromarray(tile_arr).convert("RGB")
+        if self.target_tile_size[idx] != self.tile_size_resized[idx]:
+            tile = tile.resize((self.target_tile_size[idx], self.target_tile_size[idx]))
+        if self.transforms:
+            if isinstance(self.transforms, BaseImageProcessor):  # Hugging Face (`transformer`)
+                tile = self.transforms(tile, return_tensors="pt")["pixel_values"].squeeze(0)
+            else:  # general callable such as torchvision transforms
+                tile = self.transforms(tile)
+        return idx, tile

{slide2vec-2.0.0 → slide2vec-2.0.1}/slide2vec/embed.py RENAMED Viewed

@@ -18,6 +18,7 @@ from slide2vec.utils import fix_random_seeds
 from slide2vec.utils.config import get_cfg_from_file, setup_distributed
 from slide2vec.models import ModelFactory
 from slide2vec.data import TileDataset, RegionUnfolding
+from slide2vec.hs2p.hs2p.wsi import SamplingParameters
 torchvision.disable_beta_transforms_warning()
@@ -60,13 +61,31 @@ def create_transforms(cfg, model):
         raise ValueError(f"Unknown model level: {cfg.model.level}")
-def create_dataset(wsi_fp, coordinates_dir, spacing, backend, transforms):
+def create_dataset(
+    wsi_path,
+    mask_path,
+    coordinates_dir,
+    target_spacing,
+    tolerance,
+    backend,
+    segment_params,
+    sampling_params,
+    filter_params,
+    transforms,
+    restrict_to_tissue: bool,
+):
     return TileDataset(
-        wsi_fp,
-        coordinates_dir,
-        spacing,
+        wsi_path=wsi_path,
+        mask_path=mask_path,
+        coordinates_dir=coordinates_dir,
+        target_spacing=target_spacing,
+        tolerance=tolerance,
         backend=backend,
+        segment_params=segment_params,
+        sampling_params=sampling_params,
+        filter_params=filter_params,
         transforms=transforms,
+        restrict_to_tissue=restrict_to_tissue,
     )
@@ -154,10 +173,13 @@ def main(args):
         process_list.is_file()
     ), "Process list CSV not found. Ensure tiling has been run."
     process_df = pd.read_csv(process_list)
+    cols = ["wsi_name", "wsi_path", "tiling_status", "error", "traceback"]
     if "feature_status" not in process_df.columns:
         process_df["feature_status"] = ["tbp"] * len(process_df)
-        cols = ["wsi_name", "wsi_path", "mask_path", "tiling_status", "feature_status", "error", "traceback"]
-        process_df = process_df[cols]
+    if "mask_path" not in process_df.columns:
+        process_df["mask_path"] = [None] * len(process_df)
+    cols = ["wsi_name", "wsi_path", "mask_path", "tiling_status", "feature_status", "error", "traceback"]
+    process_df = process_df[cols]
     skip_feature_extraction = process_df["feature_status"].str.contains("success").all()
@@ -176,12 +198,30 @@ def main(args):
         if not run_on_cpu:
             torch.distributed.barrier()
+        pixel_mapping = {k: v for e in cfg.tiling.sampling_params.pixel_mapping for k, v in e.items()}
+        tissue_percentage = {k: v for e in cfg.tiling.sampling_params.tissue_percentage for k, v in e.items()}
+        if "tissue" not in tissue_percentage:
+            tissue_percentage["tissue"] = cfg.tiling.params.min_tissue_percentage
+        if cfg.tiling.sampling_params.color_mapping is not None:
+            color_mapping = {k: v for e in cfg.tiling.sampling_params.color_mapping for k, v in e.items()}
+        else:
+            color_mapping = None
+        sampling_params = SamplingParameters(
+            pixel_mapping=pixel_mapping,
+            color_mapping=color_mapping,
+            tissue_percentage=tissue_percentage,
+        )
         # select slides that were successfully tiled but not yet processed for feature extraction
         tiled_df = process_df[process_df.tiling_status == "success"]
         mask = tiled_df["feature_status"] != "success"
         process_stack = tiled_df[mask]
         total = len(process_stack)
         wsi_paths_to_process = [Path(x) for x in process_stack.wsi_path.values.tolist()]
+        mask_paths_to_process = [Path(x) if x is not None and not pd.isna(x) else None  for x in process_stack.mask_path.values.tolist()]
+        combined_paths = zip(wsi_paths_to_process, mask_paths_to_process)
         features_dir = Path(cfg.output_dir, "features")
         if distributed.is_main_process():
@@ -201,8 +241,8 @@ def main(args):
         transforms = create_transforms(cfg, model)
         print(f"transforms: {transforms}")
-        for wsi_fp in tqdm.tqdm(
-            wsi_paths_to_process,
+        for wsi_fp, mask_fp in tqdm.tqdm(
+            combined_paths,
             desc="Inference",
             unit="slide",
             total=total,
@@ -211,7 +251,19 @@ def main(args):
             position=1,
         ):
             try:
-                dataset = create_dataset(wsi_fp, coordinates_dir, cfg.tiling.params.spacing, cfg.tiling.backend, transforms)
+                dataset = create_dataset(
+                    wsi_path=wsi_fp,
+                    mask_path=mask_fp,
+                    coordinates_dir=coordinates_dir,
+                    target_spacing=cfg.tiling.params.spacing,
+                    tolerance=cfg.tiling.params.tolerance,
+                    backend=cfg.tiling.backend,
+                    segment_params=cfg.tiling.seg_params,
+                    sampling_params=sampling_params,
+                    filter_params=cfg.tiling.filter_params,
+                    transforms=transforms,
+                    restrict_to_tissue=cfg.model.restrict_to_tissue,
+                )
                 if distributed.is_enabled_and_multiple_gpus():
                     sampler = torch.utils.data.DistributedSampler(
                         dataset,

{slide2vec-2.0.0 → slide2vec-2.0.1}/slide2vec/models/models.py RENAMED Viewed

@@ -13,7 +13,6 @@ from timm.data.constants import IMAGENET_INCEPTION_MEAN, IMAGENET_INCEPTION_STD
 from timm.data.transforms_factory import create_transform
 from conch.open_clip_custom import create_model_from_pretrained
-from musk import modeling as musk_modeling
 from musk import utils as musk_utils
 import slide2vec.distributed as distributed
@@ -70,11 +69,12 @@ class ModelFactory:
                     pretrained_weights=options.pretrained_weights,
                     input_size=options.tile_size,
                 )
-            elif options.name is None and options.arch:
+            elif options.name == "dino" and options.arch:
                 model = DINOViT(
                     arch=options.arch,
                     pretrained_weights=options.pretrained_weights,
                     input_size=options.tile_size,
+                    patch_size=options.token_size,
                 )
         elif options.level == "region":
             if options.name == "virchow":
@@ -259,7 +259,17 @@ class DINOViT(FeatureExtractor):
     def load_weights(self):
         if distributed.is_main_process():
             print(f"Loading pretrained weights from: {self.pretrained_weights}")
-        state_dict = torch.load(self.pretrained_weights, map_location="cpu")
+        # Fix for loading checkpoints saved with numpy 2.0+ in an environment with numpy < 2.0
+        try:
+            import numpy._core
+        except ImportError:
+            import numpy as np
+            import sys
+            sys.modules["numpy._core"] = np.core
+            sys.modules["numpy._core.multiarray"] = np.core.multiarray
+        state_dict = torch.load(self.pretrained_weights, map_location="cpu", weights_only=False)
         if self.ckpt_key:
             state_dict = state_dict[self.ckpt_key]
         nn.modules.utils.consume_prefix_in_state_dict_if_present(
@@ -282,21 +292,13 @@ class DINOViT(FeatureExtractor):
         return encoder
     def get_transforms(self):
-        if self.input_size > 224:
-            transform = transforms.Compose(
-                [
-                    MaybeToTensor(),
-                    transforms.CenterCrop(224),
-                    make_normalize_transform(),
-                ]
-            )
-        else:
-            transforms.Compose(
-                [
-                    MaybeToTensor(),
-                    make_normalize_transform(),
-                ]
-            )
+        transform = transforms.Compose(
+            [
+                MaybeToTensor(),
+                transforms.CenterCrop(self.input_size),
+                make_normalize_transform(),
+            ]
+        )
         return transform
     def forward(self, x):
@@ -344,7 +346,7 @@ class CustomViT(FeatureExtractor):
     def load_weights(self):
         if distributed.is_main_process():
             print(f"Loading pretrained weights from: {self.pretrained_weights}")
-        state_dict = torch.load(self.pretrained_weights, map_location="cpu")
+        state_dict = torch.load(self.pretrained_weights, map_location="cpu", weights_only=False)
         if self.ckpt_key:
             state_dict = state_dict[self.ckpt_key]
         nn.modules.utils.consume_prefix_in_state_dict_if_present(

{slide2vec-2.0.0 → slide2vec-2.0.1}/slide2vec/utils/__init__.py RENAMED Viewed

@@ -2,7 +2,6 @@ from .utils import (
     initialize_wandb,
     fix_random_seeds,
     get_sha,
-    load_csv,
     update_state_dict,
 )
 from .log_utils import setup_logging

{slide2vec-2.0.0 → slide2vec-2.0.1}/slide2vec/utils/config.py RENAMED Viewed

@@ -11,7 +11,7 @@ from omegaconf import OmegaConf
 import slide2vec.distributed as distributed
 from slide2vec.utils import initialize_wandb, fix_random_seeds, get_sha, setup_logging
-from slide2vec.configs import default_config
+from slide2vec.configs import default_tiling_config, default_model_config
 logger = logging.getLogger("slide2vec")
@@ -25,7 +25,9 @@ def write_config(cfg, output_dir, name="config.yaml"):
 def get_cfg_from_file(config_file):
-    default_cfg = OmegaConf.create(default_config)
+    default_tiling_cfg = OmegaConf.create(default_tiling_config)
+    default_embedding_cfg = OmegaConf.create(default_model_config)
+    default_cfg = OmegaConf.merge(default_tiling_cfg, default_embedding_cfg)
     cfg = OmegaConf.load(config_file)
     cfg = OmegaConf.merge(default_cfg, cfg)
     OmegaConf.resolve(cfg)
@@ -36,7 +38,9 @@ def get_cfg_from_args(args):
     if args.output_dir is not None:
         args.output_dir = os.path.abspath(args.output_dir)
         args.opts += [f"output_dir={args.output_dir}"]
-    default_cfg = OmegaConf.create(default_config)
+    default_tiling_cfg = OmegaConf.create(default_tiling_config)
+    default_embedding_cfg = OmegaConf.create(default_model_config)
+    default_cfg = OmegaConf.merge(default_tiling_cfg, default_embedding_cfg)
     cfg = OmegaConf.load(args.config_file)
     cfg = OmegaConf.merge(default_cfg, cfg, OmegaConf.from_cli(args.opts))
     OmegaConf.resolve(cfg)

{slide2vec-2.0.0 → slide2vec-2.0.1}/slide2vec/utils/utils.py RENAMED Viewed

@@ -111,21 +111,6 @@ def initialize_wandb(
     return run
-def load_csv(cfg):
-    df = pd.read_csv(cfg.csv)
-    if "wsi_path" in df.columns:
-        wsi_paths = [Path(x) for x in df.wsi_path.values.tolist()]
-    elif "slide_path" in df.columns:
-        wsi_paths = [Path(x) for x in df.slide_path.values.tolist()]
-    if "mask_path" in df.columns:
-        mask_paths = [Path(x) for x in df.mask_path.values.tolist()]
-    elif "segmentation_mask_path" in df.columns:
-        mask_paths = [Path(x) for x in df.segmentation_mask_path.values.tolist()]
-    else:
-        mask_paths = [None for _ in wsi_paths]
-    return wsi_paths, mask_paths
 def update_state_dict(
     *,
     model_dict: dict,

{slide2vec-2.0.0 → slide2vec-2.0.1/slide2vec.egg-info}/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.4
 Name: slide2vec
-Version: 2.0.0
+Version: 2.0.1
 Summary: Embedding of whole slide images with Foundation Models
 Home-page: https://github.com/clemsgrs/slide2vec
 Author: Clément Grisi
@@ -58,6 +58,37 @@ Dynamic: project-url
 [![Docker Version](https://img.shields.io/docker/v/waticlems/slide2vec?sort=semver&label=docker&logo=docker&color=2496ED)](https://hub.docker.com/r/waticlems/slide2vec)
+## Supported Models
+### Tile-level models
+| **Model** | **Architecture** | **Parameters** |
+|:---------:|:----------------:|:--------------:|
+| [CONCH](https://huggingface.co/MahmoodLab/conch) | ViT-B/16 | 86M |
+| [H0-mini](https://huggingface.co/bioptimus/H0-mini) | ViT-B/16 | 86M |
+| [Hibou-B](https://huggingface.co/histai/hibou-b) | ViT-B/16 | 86M |
+| [Hibou-L](https://huggingface.co/histai/hibou-L) | ViT-L/16 | 307M |
+| [MUSK](https://huggingface.co/xiangjx/musk) | ViT-L/16 | 307M |
+| [Phikon-v2](https://huggingface.co/owkin/phikon-v2) | ViT-L/16 | 307M |
+| [UNI](https://huggingface.co/MahmoodLab/UNI) | ViT-L/16 | 307M |
+| [Virchow](https://huggingface.co/paige-ai/Virchow) | ViT-H/14 | 632M |
+| [Virchow2](https://huggingface.co/paige-ai/Virchow2) | ViT-H/14 | 632M |
+| [MidNight12k](https://huggingface.co/kaiko-ai/midnight) | ViT-G/14 | 1.1B |
+| [UNI2](https://huggingface.co/MahmoodLab/UNI2-h) | ViT-G/14 | 1.1B |
+| [Prov-GigaPath](https://huggingface.co/prov-gigapath/prov-gigapath) | ViT-G/14 | 1.1B |
+| [H-optimus-0](https://huggingface.co/bioptimus/H-optimus-0) | ViT-G/14 | 1.1B |
+| [H-optimus-1](https://huggingface.co/bioptimus/H-optimus-1) | ViT-G/14 | 1.1B |
+| [Kaiko](https://github.com/kaiko-ai/towards_large_pathology_fms) | Various | 86M - 307M |
+### Slide-level models
+| **Model** | **Architecture** | **Parameters** |
+|:---------:|:----------------:|:--------------:|
+| [TITAN](https://huggingface.co/MahmoodLab/TITAN) | Transformer | 49M |
+| [Prov-GigaPath](https://huggingface.co/prov-gigapath/prov-gigapath) | Transformer (LongNet) | 87M |
+| [PRISM](https://huggingface.co/paige-ai/PRISM) | Perceiver Resampler | 99M |
 ## 🛠️ Installation
 System requirements: Linux-based OS (e.g., Ubuntu 22.04) with Python 3.10+ and Docker installed.
@@ -77,7 +108,7 @@ Replace `/path/to/your/data` with your local data directory.
 Alternatively, you can install `slide2vec` via pip:
 ```shell
-pip install slide2vec
+pip install slide2vechel
 ```
 ## 🚀 Extract features
@@ -93,10 +124,11 @@ pip install slide2vec
 2. Create a configuration file
-   A good starting point is the default configuration file `slide2vec/configs/default.yaml` where parameters are documented.<br>
-   We've also added default configuration files for each of the foundation models currently supported:
-   - tile-level: `uni`, `uni2`, `virchow`, `virchow2`, `prov-gigapath`, `h-optimus-0`, `h-optimus-1`, `h0-mini`, `conch`, `musk`, `phikonv2`, `hibou-b`, `hibou-L`, [`kaiko`](https://github.com/kaiko-ai/towards_large_pathology_fms)
-   - slide-level: `prov-gigapath`, `titan`, `prism`
+   A good starting point are the default configuration files where parameters are documented:<br>
+   - for preprocessing options: `slide2vec/configs/default_tiling.yaml`
+   - for model options: `slide2vec/configs/default_model_.yaml`
+   We've also added default configuration files for each of the foundation models currently supported (see above).
 3. Kick off distributed feature extraction

slide2vec-2.0.0/README.md DELETED Viewed

@@ -1,52 +0,0 @@
-# slide2vec
-[![PyPI version](https://img.shields.io/pypi/v/slide2vec?label=pypi&logo=pypi&color=3776AB)](https://pypi.org/project/slide2vec/)
-[![Docker Version](https://img.shields.io/docker/v/waticlems/slide2vec?sort=semver&label=docker&logo=docker&color=2496ED)](https://hub.docker.com/r/waticlems/slide2vec)
-## 🛠️ Installation
-System requirements: Linux-based OS (e.g., Ubuntu 22.04) with Python 3.10+ and Docker installed.
-We recommend running the script inside a container using the latest `slide2vec` image from Docker Hub:
-```shell
-docker pull waticlems/slide2vec:latest
-docker run --rm -it \
-    -v /path/to/your/data:/data \
-    -e HF_TOKEN=<your-huggingface-api-token> \
-    waticlems/slide2vec:latest
-```
-Replace `/path/to/your/data` with your local data directory.
-Alternatively, you can install `slide2vec` via pip:
-```shell
-pip install slide2vec
-```
-## 🚀 Extract features
-1. Create a `.csv` file with slide paths. Optionally, you can provide paths to pre-computed tissue masks.
-    ```csv
-    wsi_path,mask_path
-    /path/to/slide1.tif,/path/to/mask1.tif
-    /path/to/slide2.tif,/path/to/mask2.tif
-    ...
-    ```
-2. Create a configuration file
-   A good starting point is the default configuration file `slide2vec/configs/default.yaml` where parameters are documented.<br>
-   We've also added default configuration files for each of the foundation models currently supported:
-   - tile-level: `uni`, `uni2`, `virchow`, `virchow2`, `prov-gigapath`, `h-optimus-0`, `h-optimus-1`, `h0-mini`, `conch`, `musk`, `phikonv2`, `hibou-b`, `hibou-L`, [`kaiko`](https://github.com/kaiko-ai/towards_large_pathology_fms)
-   - slide-level: `prov-gigapath`, `titan`, `prism`
-3. Kick off distributed feature extraction
-    ```shell
-    python3 -m slide2vec.main --config-file </path/to/config.yaml>
-    ```

slide2vec-2.0.0/slide2vec/__init__.py DELETED Viewed

	@@ -1 +0,0 @@
1	- __version__ = "2.0.0"

slide2vec-2.0.0/slide2vec/configs/__init__.py DELETED Viewed

@@ -1,17 +0,0 @@
-import pathlib
-from omegaconf import OmegaConf
-def load_config(config_name: str):
-    config_filename = config_name + ".yaml"
-    return OmegaConf.load(pathlib.Path(__file__).parent.resolve() / config_filename)
-default_config = load_config("default")
-def load_and_merge_config(config_name: str):
-    default_config = OmegaConf.create(default_config)
-    loaded_config = load_config(config_name)
-    return OmegaConf.merge(default_config, loaded_config)

slide2vec-2.0.0/slide2vec/data/dataset.py DELETED Viewed

@@ -1,66 +0,0 @@
-import torch
-import numpy as np
-import wholeslidedata as wsd
-from transformers.image_processing_utils import BaseImageProcessor
-from PIL import Image
-from pathlib import Path
-class TileDataset(torch.utils.data.Dataset):
-    def __init__(self, wsi_path, tile_dir, target_spacing, backend, transforms=None):
-        self.path = wsi_path
-        self.target_spacing = target_spacing
-        self.backend = backend
-        self.name = wsi_path.stem.replace(" ", "_")
-        self.load_coordinates(tile_dir)
-        self.transforms = transforms
-    def load_coordinates(self, tile_dir):
-        coordinates = np.load(Path(tile_dir, f"{self.name}.npy"), allow_pickle=True)
-        self.x = coordinates["x"]
-        self.y = coordinates["y"]
-        self.coordinates = (np.array([self.x, self.y]).T).astype(int)
-        self.scaled_coordinates = self.scale_coordinates()
-        self.tile_level = coordinates["tile_level"]
-        self.tile_size_resized = coordinates["tile_size_resized"]
-        resize_factor = coordinates["resize_factor"]
-        self.tile_size = np.round(self.tile_size_resized / resize_factor).astype(int)
-        self.tile_size_lv0 = coordinates["tile_size_lv0"][0]
-    def scale_coordinates(self):
-        # coordinates are defined w.r.t. level 0
-        # i need to scale them to target_spacing
-        wsi = wsd.WholeSlideImage(self.path, backend=self.backend)
-        min_spacing = wsi.spacings[0]
-        scale = min_spacing / self.target_spacing
-        # create a [N, 2] array with x and y coordinates
-        scaled_coordinates = (self.coordinates * scale).astype(int)
-        return scaled_coordinates
-    def __len__(self):
-        return len(self.x)
-    def __getitem__(self, idx):
-        wsi = wsd.WholeSlideImage(
-            self.path, backend=self.backend
-        )  # cannot be defined in __init__ because of multiprocessing
-        tile_level = self.tile_level[idx]
-        tile_spacing = wsi.spacings[tile_level]
-        tile_arr = wsi.get_patch(
-            self.x[idx],
-            self.y[idx],
-            self.tile_size_resized[idx],
-            self.tile_size_resized[idx],
-            spacing=tile_spacing,
-            center=False,
-        )
-        tile = Image.fromarray(tile_arr).convert("RGB")
-        if self.tile_size[idx] != self.tile_size_resized[idx]:
-            tile = tile.resize((self.tile_size[idx], self.tile_size[idx]))
-        if self.transforms:
-            if isinstance(self.transforms, BaseImageProcessor):  # Hugging Face (`transformer`)
-                tile = self.transforms(tile, return_tensors="pt")["pixel_values"].squeeze(0)
-            else:  # general callable such as torchvision transforms
-                tile = self.transforms(tile)
-        return idx, tile

{slide2vec-2.0.0 → slide2vec-2.0.1}/LICENSE RENAMED Viewed

File without changes

{slide2vec-2.0.0 → slide2vec-2.0.1}/MANIFEST.in RENAMED Viewed

File without changes

{slide2vec-2.0.0 → slide2vec-2.0.1}/setup.py RENAMED Viewed

File without changes

{slide2vec-2.0.0 → slide2vec-2.0.1}/slide2vec/aggregate.py RENAMED Viewed

File without changes

{slide2vec-2.0.0 → slide2vec-2.0.1}/slide2vec/data/__init__.py RENAMED Viewed

File without changes

{slide2vec-2.0.0 → slide2vec-2.0.1}/slide2vec/data/augmentations.py RENAMED Viewed

File without changes

{slide2vec-2.0.0 → slide2vec-2.0.1}/slide2vec/distributed/__init__.py RENAMED Viewed

File without changes

{slide2vec-2.0.0 → slide2vec-2.0.1}/slide2vec/main.py RENAMED Viewed

File without changes

{slide2vec-2.0.0 → slide2vec-2.0.1}/slide2vec/models/__init__.py RENAMED Viewed

File without changes

{slide2vec-2.0.0 → slide2vec-2.0.1}/slide2vec/models/layers/__init__.py RENAMED Viewed

File without changes

{slide2vec-2.0.0 → slide2vec-2.0.1}/slide2vec/models/layers/attention.py RENAMED Viewed

File without changes

{slide2vec-2.0.0 → slide2vec-2.0.1}/slide2vec/models/layers/block.py RENAMED Viewed

File without changes

{slide2vec-2.0.0 → slide2vec-2.0.1}/slide2vec/models/layers/dino_head.py RENAMED Viewed

File without changes

{slide2vec-2.0.0 → slide2vec-2.0.1}/slide2vec/models/layers/drop_path.py RENAMED Viewed

File without changes

{slide2vec-2.0.0 → slide2vec-2.0.1}/slide2vec/models/layers/layer_scale.py RENAMED Viewed

File without changes

{slide2vec-2.0.0 → slide2vec-2.0.1}/slide2vec/models/layers/mlp.py RENAMED Viewed

File without changes

{slide2vec-2.0.0 → slide2vec-2.0.1}/slide2vec/models/layers/patch_embed.py RENAMED Viewed

File without changes

{slide2vec-2.0.0 → slide2vec-2.0.1}/slide2vec/models/layers/swiglu_ffn.py RENAMED Viewed

File without changes

{slide2vec-2.0.0 → slide2vec-2.0.1}/slide2vec/models/vision_transformer_dino.py RENAMED Viewed

File without changes

{slide2vec-2.0.0 → slide2vec-2.0.1}/slide2vec/models/vision_transformer_dinov2.py RENAMED Viewed

File without changes

{slide2vec-2.0.0 → slide2vec-2.0.1}/slide2vec/utils/log_utils.py RENAMED Viewed

File without changes

{slide2vec-2.0.0 → slide2vec-2.0.1}/slide2vec.egg-info/SOURCES.txt RENAMED Viewed

File without changes

{slide2vec-2.0.0 → slide2vec-2.0.1}/slide2vec.egg-info/dependency_links.txt RENAMED Viewed

File without changes

{slide2vec-2.0.0 → slide2vec-2.0.1}/slide2vec.egg-info/not-zip-safe RENAMED Viewed

File without changes

{slide2vec-2.0.0 → slide2vec-2.0.1}/slide2vec.egg-info/requires.txt RENAMED Viewed

File without changes

{slide2vec-2.0.0 → slide2vec-2.0.1}/slide2vec.egg-info/top_level.txt RENAMED Viewed

File without changes

slide2vec 2.0.0__tar.gz → 2.0.1__tar.gz

slide2vec 2.0.0tar.gz → 2.0.1tar.gz