deepliif 1.1.15__py3-none-any.whl → 1.2.1__py3-none-any.whl

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -1,628 +1,624 @@
1
- Metadata-Version: 2.1
2
- Name: deepliif
3
- Version: 1.1.15
4
- Summary: DeepLIIF: Deep-Learning Inferred Multiplex Immunofluorescence for Immunohistochemical Image Quantification
5
- Home-page: https://github.com/nadeemlab/DeepLIIF
6
- Author: Parmida93
7
- Author-email: ghahremani.parmida@gmail.com
8
- License: UNKNOWN
9
- Keywords: DeepLIIF,IHC,Segmentation,Classification
10
- Platform: UNKNOWN
11
- Description-Content-Type: text/markdown
12
- License-File: LICENSE.md
13
- Requires-Dist: opencv-python (==4.8.1.78)
14
- Requires-Dist: torch (==1.13.1)
15
- Requires-Dist: torchvision (==0.14.1)
16
- Requires-Dist: scikit-image (==0.18.3)
17
- Requires-Dist: dominate (==2.6.0)
18
- Requires-Dist: numba (==0.57.1)
19
- Requires-Dist: Click (==8.0.3)
20
- Requires-Dist: requests (==2.32.2)
21
- Requires-Dist: dask (==2021.11.2)
22
- Requires-Dist: visdom (>=0.1.8.3)
23
- Requires-Dist: python-bioformats (>=4.0.6)
24
- Requires-Dist: imagecodecs (==2023.3.16)
25
- Requires-Dist: zarr (==2.16.1)
26
-
27
-
28
- <!-- PROJECT LOGO -->
29
- <br />
30
- <p align="center">
31
- <img src="./images/DeepLIIF_logo.png" width="50%">
32
- <h3 align="center"><strong>Deep-Learning Inferred Multiplex Immunofluorescence for Immunohistochemical Image Quantification</strong></h3>
33
- <p align="center">
34
- <a href="https://rdcu.be/cKSBz">Nature MI'22</a>
35
- |
36
- <a href="https://openaccess.thecvf.com/content/CVPR2022/html/Ghahremani_DeepLIIF_An_Online_Platform_for_Quantification_of_Clinical_Pathology_Slides_CVPR_2022_paper.html">CVPR'22</a>
37
- |
38
- <a href="https://arxiv.org/abs/2305.16465">MICCAI'23</a>
39
- |
40
- <a href="https://onlinelibrary.wiley.com/share/author/4AEBAGEHSZE9GDP3H8MN?target=10.1111/his.15048">Histopathology'23</a>
41
- |
42
- <a href="https://arxiv.org/abs/2405.08169">MICCAI'24</a>
43
- |
44
- <a href="https://deepliif.org/">Cloud Deployment</a>
45
- |
46
- <a href="https://nadeemlab.github.io/DeepLIIF/">Documentation</a>
47
- |
48
- <a href="#support">Support</a>
49
- </p>
50
- </p>
51
-
52
- *Reporting biomarkers assessed by routine immunohistochemical (IHC) staining of tissue is broadly used in diagnostic
53
- pathology laboratories for patient care. To date, clinical reporting is predominantly qualitative or semi-quantitative.
54
- By creating a multitask deep learning framework referred to as DeepLIIF, we present a single-step solution to stain
55
- deconvolution/separation, cell segmentation, and quantitative single-cell IHC scoring. Leveraging a unique de novo
56
- dataset of co-registered IHC and multiplex immunofluorescence (mpIF) staining of the same slides, we segment and
57
- translate low-cost and prevalent IHC slides to more expensive-yet-informative mpIF images, while simultaneously
58
- providing the essential ground truth for the superimposed brightfield IHC channels. Moreover, a new nuclear-envelop
59
- stain, LAP2beta, with high (>95%) cell coverage is introduced to improve cell delineation/segmentation and protein
60
- expression quantification on IHC slides. By simultaneously translating input IHC images to clean/separated mpIF channels
61
- and performing cell segmentation/classification, we show that our model trained on clean IHC Ki67 data can generalize to
62
- more noisy and artifact-ridden images as well as other nuclear and non-nuclear markers such as CD3, CD8, BCL2, BCL6,
63
- MYC, MUM1, CD10, and TP53. We thoroughly evaluate our method on publicly available benchmark datasets as well as against
64
- pathologists' semi-quantitative scoring. Trained on IHC, DeepLIIF generalizes well to H&E images for out-of-the-box nuclear
65
- segmentation.*
66
-
67
- **DeepLIIF** is deployed as a free publicly available cloud-native platform (https://deepliif.org) with Bioformats (more than 150 input formats supported) and MLOps pipeline. We also release **DeepLIIF** implementations for single/multi-GPU training, Torchserve/Dask+Torchscript deployment, and auto-scaling via Pulumi (1000s of concurrent connections supported); details can be found in our [documentation](https://nadeemlab.github.io/DeepLIIF/). **DeepLIIF** can be run locally (GPU required) by [pip installing the package](https://github.com/nadeemlab/DeepLIIF/edit/main/README.md#installing-deepliif) and using the deepliif CLI command. **DeepLIIF** can be used remotely (no GPU required) through the https://deepliif.org website, calling the [cloud API via Python](https://github.com/nadeemlab/DeepLIIF/edit/main/README.md#cloud-deployment), or via the [ImageJ/Fiji plugin](https://github.com/nadeemlab/DeepLIIF/edit/main/README.md#imagej-plugin); details for the free cloud-native platform can be found in our [CVPR'22 paper](https://arxiv.org/pdf/2204.04494.pdf).
68
-
69
- © This code is made available for non-commercial academic purposes.
70
-
71
- ![Version](https://img.shields.io/static/v1?label=latest&message=v1.1.15&color=darkgreen)
72
- [![Total Downloads](https://static.pepy.tech/personalized-badge/deepliif?period=total&units=international_system&left_color=grey&right_color=blue&left_text=total%20downloads)](https://pepy.tech/project/deepliif?&left_text=totalusers)
73
-
74
- ![overview_image](./images/overview.png)*Overview of DeepLIIF pipeline and sample input IHCs (different
75
- brown/DAB markers -- BCL2, BCL6, CD10, CD3/CD8, Ki67) with corresponding DeepLIIF-generated hematoxylin/mpIF modalities
76
- and classified (positive (red) and negative (blue) cell) segmentation masks. (a) Overview of DeepLIIF. Given an IHC
77
- input, our multitask deep learning framework simultaneously infers corresponding Hematoxylin channel, mpIF DAPI, mpIF
78
- protein expression (Ki67, CD3, CD8, etc.), and the positive/negative protein cell segmentation, baking explainability
79
- and interpretability into the model itself rather than relying on coarse activation/attention maps. In the segmentation
80
- mask, the red cells denote cells with positive protein expression (brown/DAB cells in the input IHC), whereas blue cells
81
- represent negative cells (blue cells in the input IHC). (b) Example DeepLIIF-generated hematoxylin/mpIF modalities and
82
- segmentation masks for different IHC markers. DeepLIIF, trained on clean IHC Ki67 nuclear marker images, can generalize
83
- to noisier as well as other IHC nuclear/cytoplasmic marker images.*
84
-
85
- ## Prerequisites
86
- 1. Python 3.8
87
- 2. Docker
88
-
89
- ## Installing `deepliif`
90
-
91
- DeepLIIF can be `pip` installed:
92
- ```shell
93
- $ conda create --name deepliif_env python=3.8
94
- $ conda activate deepliif_env
95
- (deepliif_env) $ conda install -c conda-forge openjdk
96
- (deepliif_env) $ pip install deepliif
97
- ```
98
-
99
- The package is composed of two parts:
100
- 1. A library that implements the core functions used to train and test DeepLIIF models.
101
- 2. A CLI to run common batch operations including training, batch testing and Torchscipt models serialization.
102
-
103
- You can list all available commands:
104
-
105
- ```
106
- (venv) $ deepliif --help
107
- Usage: deepliif [OPTIONS] COMMAND [ARGS]...
108
-
109
- Options:
110
- --help Show this message and exit.
111
-
112
- Commands:
113
- prepare-testing-data Preparing data for testing
114
- serialize Serialize DeepLIIF models using Torchscript
115
- test Test trained models
116
- train General-purpose training script for multi-task...
117
- ```
118
-
119
- **Note:** You might need to install a version of PyTorch that is compatible with your CUDA version.
120
- Otherwise, only the CPU will be used.
121
- Visit the [PyTorch website](https://pytorch.org/) for details.
122
- You can confirm if your installation will run on the GPU by checking if the following returns `True`:
123
-
124
- ```
125
- import torch
126
- torch.cuda.is_available()
127
- ```
128
-
129
- ## Training Dataset
130
- For training, all image sets must be 512x512 and combined together in 3072x512 images (six images of size 512x512 stitched
131
- together horizontally).
132
- The data need to be arranged in the following order:
133
- ```
134
- XXX_Dataset
135
- ├── train
136
- └── val
137
- ```
138
- We have provided a simple function in the CLI for preparing data for training.
139
-
140
- * **To prepare data for training**, you need to have the image dataset for each image (including IHC, Hematoxylin Channel, mpIF DAPI, mpIF Lap2, mpIF marker, and segmentation mask) in the input directory.
141
- Each of the six images for a single image set must have the same naming format, with only the name of the label for the type of image differing between them. The label names must be, respectively: IHC, Hematoxylin, DAPI, Lap2, Marker, Seg.
142
- The command takes the address of the directory containing image set data and the address of the output dataset directory.
143
- It first creates the train and validation directories inside the given output dataset directory.
144
- It then reads all of the images in the input directory and saves the combined image in the train or validation directory, based on the given `validation_ratio`.
145
- ```
146
- deepliif prepare-training-data --input-dir /path/to/input/images
147
- --output-dir /path/to/output/images
148
- --validation-ratio 0.2
149
- ```
150
-
151
- ## Training
152
- To train a model:
153
- ```
154
- deepliif train --dataroot /path/to/input/images
155
- --name Model_Name
156
- ```
157
- or
158
- ```
159
- python train.py --dataroot /path/to/input/images
160
- --name Model_Name
161
- ```
162
-
163
- * To view training losses and results, open the URL http://localhost:8097. For cloud servers replace localhost with your IP.
164
- * Epoch-wise intermediate training results are in `DeepLIIF/checkpoints/Model_Name/web/index.html`.
165
- * Trained models will be by default be saved in `DeepLIIF/checkpoints/Model_Name`.
166
- * Training datasets can be downloaded [here](https://zenodo.org/record/4751737#.YKRTS0NKhH4).
167
-
168
- **DP**: To train a model you can use DP. DP is single-process. It means that **all the GPUs you want to use must be on the same machine** so that they can be included in the same process - you cannot distribute the training across multiple GPU machines, unless you write your own code to handle inter-node (node = machine) communication.
169
- To split and manage the workload for multiple GPUs within the same process, DP uses multi-threading.
170
- You can find more information on DP [here](https://github.com/nadeemlab/DeepLIIF/blob/main/Multi-GPU%20Training.md).
171
-
172
- To train a model with DP (Example with 2 GPUs (on 1 machine)):
173
- ```
174
- deepliif train --dataroot <data_dir> --batch-size 6 --gpu-ids 0 --gpu-ids 1
175
- ```
176
- Note that `batch-size` is defined per process. Since DP is a single-process method, the `batch-size` you set is the **effective** batch size.
177
-
178
- **DDP**: To train a model you can use DDP. DDP usually spawns multiple processes.
179
- **DeepLIIF's code follows the PyTorch recommendation to spawn 1 process per GPU** ([doc](https://github.com/pytorch/examples/blob/master/distributed/ddp/README.md#application-process-topologies)). If you want to assign multiple GPUs to each process, you will need to make modifications to DeepLIIF's code (see [doc](https://pytorch.org/tutorials/intermediate/ddp_tutorial.html#combine-ddp-with-model-parallelism)).
180
- Despite all the benefits of DDP, one drawback is the extra GPU memory needed for dedicated CUDA buffer for communication. See a short discussion [here](https://discuss.pytorch.org/t/do-dataparallel-and-distributeddataparallel-affect-the-batch-size-and-gpu-memory-consumption/97194/2). In the context of DeepLIIF, this means that there might be situations where you could use a *bigger batch size with DP* as compared to DDP, which may actually train faster than using DDP with a smaller batch size.
181
- You can find more information on DDP [here](https://github.com/nadeemlab/DeepLIIF/blob/main/Multi-GPU%20Training.md).
182
-
183
- To launch training using DDP on a local machine, use `deepliif trainlaunch`. Example with 2 GPUs (on 1 machine):
184
- ```
185
- deepliif trainlaunch --dataroot <data_dir> --batch-size 3 --gpu-ids 0 --gpu-ids 1 --use-torchrun "--nproc_per_node 2"
186
- ```
187
- Note that
188
- 1. `batch-size` is defined per process. Since DDP is a single-process method, the `batch-size` you set is the batch size for each process, and the **effective** batch size will be `batch-size` multiplied by the number of processes you started. In the above example, it will be 3 * 2 = 6.
189
- 2. You still need to provide **all GPU ids to use** to the training command. Internally, in each process DeepLIIF picks the device using `gpu_ids[local_rank]`. If you provide `--gpu-ids 2 --gpu-ids 3`, the process with local rank 0 will use gpu id 2 and that with local rank 1 will use gpu id 3.
190
- 3. `-t 3 --log_dir <log_dir>` is not required, but is a useful setting in `torchrun` that saves the log from each process to your target log directory. For example:
191
- ```
192
- deepliif trainlaunch --dataroot <data_dir> --batch-size 3 --gpu-ids 0 --gpu-ids 1 --use-torchrun "-t 3 --log_dir <log_dir> --nproc_per_node 2"
193
- ```
194
- 4. If your PyTorch is older than 1.10, DeepLIIF calls `torch.distributed.launch` in the backend. Otherwise, DeepLIIF calls `torchrun`.
195
-
196
- ## Serialize Model
197
- The installed `deepliif` uses Dask to perform inference on the input IHC images.
198
- Before running the `test` command, the model files must be serialized using Torchscript.
199
- To serialize the model files:
200
- ```
201
- deepliif serialize --model-dir /path/to/input/model/files
202
- --output-dir /path/to/output/model/files
203
- ```
204
- * By default, the model files are expected to be located in `DeepLIIF/model-server/DeepLIIF_Latest_Model`.
205
- * By default, the serialized files will be saved to the same directory as the input model files.
206
-
207
- ## Testing
208
- To test the model:
209
- ```
210
- deepliif test --input-dir /path/to/input/images
211
- --output-dir /path/to/output/images
212
- --model-dir /path/to/the/serialized/model
213
- --tile-size 512
214
- ```
215
- or
216
- ```
217
- python test.py --dataroot /path/to/input/images
218
- --results_dir /path/to/output/images
219
- --checkpoints_dir /path/to/model/files
220
- --name Model_Name
221
- ```
222
- * The latest version of the pretrained models can be downloaded [here](https://zenodo.org/record/4751737#.YKRTS0NKhH4).
223
- * Before running test on images, the model files must be serialized as described above.
224
- * The serialized model files are expected to be located in `DeepLIIF/model-server/DeepLIIF_Latest_Model`.
225
- * The test results will be saved to the specified output directory, which defaults to the input directory.
226
- * The tile size must be specified and is used to split the image into tiles for processing. The tile size is based on the resolution (scan magnification) of the input image, and the recommended values are a tile size of 512 for 40x images, 256 for 20x, and 128 for 10x. Note that the smaller the tile size, the longer inference will take.
227
- * Testing datasets can be downloaded [here](https://zenodo.org/record/4751737#.YKRTS0NKhH4).
228
-
229
- **Test Command Options:**
230
- In addition to the required parameters given above, the following optional parameters are available for `deepliif test`:
231
- * `--eager-mode` Run the original model files (instead of serialized model files).
232
- * `--seg-intermediate` Save the intermediate segmentation maps created for each modality.
233
- * `--seg-only` Save only the segmentation files, and do not infer images that are not needed.
234
- * `--color-dapi` Color the inferred DAPI image.
235
- * `--color-marker` Color the inferred marker image.
236
-
237
- **Whole Slide Image (WSI) Inference:**
238
- For translation and segmentation of whole slide images,
239
- you can simply use the `test-wsi` command
240
- giving path to the directory containing your WSI as the input-dir
241
- and specifying the filename of the WSI.
242
- DeepLIIF automatically reads the WSI region by region,
243
- and translate and segment each region separately and stitches the regions
244
- to create the translation and segmentation for whole slide image,
245
- then saves all masks in the format of ome.tiff in the given output-dir.
246
- Based on the available resources, the region-size can be changed.
247
- ```
248
- deepliif test-wsi --input-dir /path/to/input/image
249
- --filename wsiFile.svs
250
- --output-dir /path/to/output/images
251
- --model-dir /path/to/the/serialized/model
252
- --tile-size 512
253
- ```
254
-
255
- **WSI Inference Options:**
256
- In addition to the required parameters given above, the following optional parameters are available for `deepliif test-wsi`:
257
- * `--region-size` Set the size of each region to read from the WSI (default is 20000).
258
- * `--seg-intermediate` Save the intermediate segmentation maps created for each modality.
259
- * `--seg-only` Save only the segmentation files, and do not infer images that are not needed.
260
- * `--color-dapi` Color the inferred DAPI image.
261
- * `--color-marker` Color the inferred marker image.
262
-
263
- **Reducing Run Time**
264
- If you need only the final segmentation and not the inferred multiplex images,
265
- it is recommended to run `deepliif test` or `deepliif test-wsi` with the `--seg-only`
266
- option. This will generate only the necessary images, thus reducing the overall run time.
267
-
268
- **Torchserve**
269
- If you prefer, it is possible to run the models using Torchserve.
270
- Please see [the documentation](https://nadeemlab.github.io/DeepLIIF/deployment/#deploying-deepliif-with-torchserve)
271
- on how to deploy the model with Torchserve and for an example of how to run the inference.
272
-
273
- ## Docker
274
- We provide a Dockerfile that can be used to run the DeepLIIF models inside a container.
275
- First, you need to install the [Docker Engine](https://docs.docker.com/engine/install/ubuntu/).
276
- After installing the Docker, you need to follow these steps:
277
- * Download the pretrained model [here](https://zenodo.org/record/4751737#.YKRTS0NKhH4) and place them in DeepLIIF/model-server/DeepLIIF_Latest_Model.
278
- * To create a docker image from the docker file:
279
- ```
280
- docker build -t cuda/deepliif .
281
- ```
282
- The image is then used as a base. You can copy and use it to run an application. The application needs an isolated
283
- environment in which to run, referred to as a container.
284
- * To create and run a container:
285
- ```
286
- docker run -it -v `pwd`:`pwd` -w `pwd` cuda/deepliif deepliif test --input-dir Sample_Large_Tissues --tile-size 512
287
- ```
288
- When you run a container from the image, the `deepliif` CLI will be available.
289
- You can easily run any CLI command in the activated environment and copy the results from the docker container to the host.
290
-
291
- ## ImageJ Plugin
292
- If you don't have access to GPU or appropriate hardware and just want to use ImageJ to run inference, we have also created an [ImageJ plugin](https://github.com/nadeemlab/DeepLIIF/tree/main/ImageJ_Plugin) for your convenience.
293
-
294
- ![DeepLIIF ImageJ Demo](images/deepliif-imagej-demo.gif)
295
-
296
- The plugin also supports submitting multiple ROIs at once:
297
-
298
- ![DeepLIIF ImageJ ROI Demo](images/deepliif-imagej-roi-demo.gif)
299
-
300
- ## Cloud Deployment
301
- If you don't have access to GPU or appropriate hardware and don't want to install ImageJ, we have also created a [cloud-native DeepLIIF deployment](https://deepliif.org) with a user-friendly interface to upload images, visualize, interact, and download the final results.
302
-
303
- ![DeepLIIF Website Demo](images/deepliif-website-demo-04.gif)
304
-
305
- Our deployment at [deepliif.org](https://deepliif.org) also provides virtual slide digitization to generate a single stitched image from a 10x video acquired with a microscope and camera. The video should be captured with the following guidelines to achieve the best results:
306
- * Brief but complete pauses at every section of the sample to avoid motion artifacts.
307
- * Significant overlap between pauses so that there is sufficient context for stitching frames together.
308
- * Methodical and consistent movement over the sample. For example, start at the top left corner, then go all the way to the right, then down one step, then all the way to the left, down one step, etc., until the end of the sample is reached. Again, brief overlapping pauses throughout will allow the best quality images to be generated.
309
-
310
- ![DeepLIIF Website Demo](images/deepliif-stitch-demo-01.gif)
311
-
312
- ## Cloud API Endpoints
313
-
314
- For small images, DeepLIIF can also be accessed programmatically through an endpoint by posting a multipart-encoded request containing the original image file, along with optional parameters including postprocessing thresholds:
315
-
316
- ```
317
- POST /api/infer
318
-
319
- File Parameter:
320
-
321
- img (required)
322
- Image on which to run DeepLIIF.
323
-
324
- Query String Parameters:
325
-
326
- resolution
327
- Resolution used to scan the slide (10x, 20x, 40x). Default is 40x.
328
-
329
- pil
330
- If present, use Pillow to load the image instead of Bio-Formats. Pillow is
331
- faster, but works only on common image types (png, jpeg, etc.).
332
-
333
- slim
334
- If present, return only the refined segmentation result image.
335
-
336
- nopost
337
- If present, do not perform postprocessing (returns only inferred images).
338
-
339
- prob_thresh
340
- Probability threshold used in postprocessing the inferred segmentation map
341
- image. The segmentation map value must be above this value in order for a
342
- pixel to be included in the final cell segmentation. Valid values are an
343
- integer in the range 0-254. Default is 150.
344
-
345
- size_thresh
346
- Lower threshold for size gating the cells in postprocessing. Segmented
347
- cells must have more pixels than this value in order to be included in the
348
- final cell segmentation. Valid values are 0, a positive integer, or 'auto'.
349
- 'Auto' will try to automatically determine this lower bound for size gating
350
- based on the distribution of detected cell sizes. Default is 'auto'.
351
-
352
- size_thresh_upper
353
- Upper threshold for size gating the cells in postprocessing. Segmented
354
- cells must have less pixels that this value in order to be included in the
355
- final cell segmentation. Valid values are a positive integer or 'none'.
356
- 'None' will use no upper threshold in size gating. Default is 'none'.
357
-
358
- marker_thresh
359
- Threshold for the effect that the inferred marker image will have on the
360
- postprocessing classification of cells as positive. If any corresponding
361
- pixel in the marker image for a cell is above this threshold, the cell will
362
- be classified as being positive regardless of the values from the inferred
363
- segmentation image. Valid values are an integer in the range 0-255, 'none',
364
- or 'auto'. 'None' will not use the marker image during classification.
365
- 'Auto' will automatically determine a threshold from the marker image.
366
- Default is 'auto'.
367
- ```
368
-
369
- For example, in Python:
370
-
371
- ```python
372
- import os
373
- import json
374
- import base64
375
- from io import BytesIO
376
-
377
- import requests
378
- from PIL import Image
379
-
380
- # Use the sample images from the main DeepLIIF repo
381
- images_dir = './Sample_Large_Tissues'
382
- filename = 'ROI_1.png'
383
-
384
- root = os.path.splitext(filename)[0]
385
-
386
- res = requests.post(
387
- url='https://deepliif.org/api/infer',
388
- files={
389
- 'img': open(f'{images_dir}/{filename}', 'rb'),
390
- },
391
- params={
392
- 'resolution': '40x',
393
- },
394
- )
395
-
396
- data = res.json()
397
-
398
- def b64_to_pil(b):
399
- return Image.open(BytesIO(base64.b64decode(b.encode())))
400
-
401
- for name, img in data['images'].items():
402
- with open(f'{images_dir}/{root}_{name}.png', 'wb') as f:
403
- b64_to_pil(img).save(f, format='PNG')
404
-
405
- with open(f'{images_dir}/{root}_scoring.json', 'w') as f:
406
- json.dump(data['scoring'], f, indent=2)
407
- print(json.dumps(data['scoring'], indent=2))
408
- ```
409
-
410
- Note that since this is a single request to send the image and receive the results, processing must complete within the timeout period (typically about one minute). If your request is receiving a 504 status code, please try a smaller image or install the `deepliif` package as detailed above to run the process locally.
411
-
412
- If you have previously run DeepLIIF on an image and want to postprocess it with different thresholds, the postprocessing routine can be called directly using the previously inferred results:
413
-
414
- ```
415
- POST /api/postprocess
416
-
417
- File Parameters:
418
-
419
- img (required)
420
- Image on which DeepLIIF was run.
421
-
422
- seg_img (required)
423
- Inferred segmentation image previously generated by DeepLIIF.
424
-
425
- marker_img (optional)
426
- Inferred marker image previously generated by DeepLIIF. If this is
427
- omitted, then the marker image will not be used in classification.
428
-
429
- Query String Parameters:
430
-
431
- resolution
432
- Resolution used to scan the slide (10x, 20x, 40x). Default is 40x.
433
-
434
- pil
435
- If present, use Pillow to load the original image instead of Bio-Formats.
436
- Pillow is faster, but works only on common image types (png, jpeg, etc.).
437
- Pillow is always used to open the seg_img and marker_img files.
438
-
439
- prob_thresh
440
- Probability threshold used in postprocessing the inferred segmentation map
441
- image. The segmentation map value must be above this value in order for a
442
- pixel to be included in the final cell segmentation. Valid values are an
443
- integer in the range 0-254. Default is 150.
444
-
445
- size_thresh
446
- Lower threshold for size gating the cells in postprocessing. Segmented
447
- cells must have more pixels than this value in order to be included in the
448
- final cell segmentation. Valid values are 0, a positive integer, or 'auto'.
449
- 'Auto' will try to automatically determine this lower bound for size gating
450
- based on the distribution of detected cell sizes. Default is 'auto'.
451
-
452
- size_thresh_upper
453
- Upper threshold for size gating the cells in postprocessing. Segmented
454
- cells must have less pixels that this value in order to be included in the
455
- final cell segmentation. Valid values are a positive integer or 'none'.
456
- 'None' will use no upper threshold in size gating. Default is 'none'.
457
-
458
- marker_thresh
459
- Threshold for the effect that the inferred marker image will have on the
460
- postprocessing classification of cells as positive. If any corresponding
461
- pixel in the marker image for a cell is above this threshold, the cell will
462
- be classified as being positive regardless of the values from the inferred
463
- segmentation image. Valid values are an integer in the range 0-255, 'none',
464
- or 'auto'. 'None' will not use the marker image during classification.
465
- 'Auto' will automatically determine a threshold from the marker image.
466
- Default is 'auto'. (If marker_img is not supplied, this has no effect.)
467
- ```
468
-
469
- For example, in Python:
470
-
471
- ```python
472
- import os
473
- import json
474
- import base64
475
- from io import BytesIO
476
-
477
- import requests
478
- from PIL import Image
479
-
480
- # Use the sample images from the main DeepLIIF repo
481
- images_dir = './Sample_Large_Tissues'
482
- filename = 'ROI_1.png'
483
-
484
- root = os.path.splitext(filename)[0]
485
-
486
- res = requests.post(
487
- url='https://deepliif.org/api/infer',
488
- files={
489
- 'img': open(f'{images_dir}/{filename}', 'rb'),
490
- 'seg_img': open(f'{images_dir}/{root}_Seg.png', 'rb'),
491
- 'marker_img': open(f'{images_dir}/{root}_Marker.png', 'rb'),
492
- },
493
- params={
494
- 'resolution': '40x',
495
- 'pil': True,
496
- 'size_thresh': 250,
497
- },
498
- )
499
-
500
- data = res.json()
501
-
502
- def b64_to_pil(b):
503
- return Image.open(BytesIO(base64.b64decode(b.encode())))
504
-
505
- for name, img in data['images'].items():
506
- with open(f'{images_dir}/{root}_{name}.png', 'wb') as f:
507
- b64_to_pil(img).save(f, format='PNG')
508
-
509
- with open(f'{images_dir}/{root}_scoring.json', 'w') as f:
510
- json.dump(data['scoring'], f, indent=2)
511
- print(json.dumps(data['scoring'], indent=2))
512
- ```
513
-
514
- ## Synthetic Data Generation
515
- The first version of DeepLIIF model suffered from its inability to separate IHC positive cells in some large clusters,
516
- resulting from the absence of clustered positive cells in our training data. To infuse more information about the
517
- clustered positive cells into our model, we present a novel approach for the synthetic generation of IHC images using
518
- co-registered data.
519
- We design a GAN-based model that receives the Hematoxylin channel, the mpIF DAPI image, and the segmentation mask and
520
- generates the corresponding IHC image. The model converts the Hematoxylin channel to gray-scale to infer more helpful
521
- information such as the texture and discard unnecessary information such as color. The Hematoxylin image guides the
522
- network to synthesize the background of the IHC image by preserving the shape and texture of the cells and artifacts in
523
- the background. The DAPI image assists the network in identifying the location, shape, and texture of the cells to
524
- better isolate the cells from the background. The segmentation mask helps the network specify the color of cells based
525
- on the type of the cell (positive cell: a brown hue, negative: a blue hue).
526
-
527
- In the next step, we generate synthetic IHC images with more clustered positive cells. To do so, we change the
528
- segmentation mask by choosing a percentage of random negative cells in the segmentation mask (called as Neg-to-Pos) and
529
- converting them into positive cells. Some samples of the synthesized IHC images along with the original IHC image are
530
- shown below.
531
-
532
- ![IHC_Gen_image](docs/training/images/IHC_Gen.jpg)*Overview of synthetic IHC image generation. (a) A training sample
533
- of the IHC-generator model. (b) Some samples of synthesized IHC images using the trained IHC-Generator model. The
534
- Neg-to-Pos shows the percentage of the negative cells in the segmentation mask converted to positive cells.*
535
-
536
- We created a new dataset using the original IHC images and synthetic IHC images. We synthesize each image in the dataset
537
- two times by setting the Neg-to-Pos parameter to %50 and %70. We re-trained our network with the new dataset. You can
538
- find the new trained model [here](https://zenodo.org/record/4751737/files/DeepLIIF_Latest_Model.zip?download=1).
539
-
540
- ## Registration
541
- To register the de novo stained mpIF and IHC images, you can use the registration framework in the 'Registration'
542
- directory. Please refer to the README file provided in the same directory for more details.
543
-
544
- ## Contributing Training Data
545
- To train DeepLIIF, we used a dataset of lung and bladder tissues containing IHC, hematoxylin, mpIF DAPI, mpIF Lap2, and
546
- mpIF Ki67 of the same tissue scanned using ZEISS Axioscan. These images were scaled and co-registered with the fixed IHC
547
- images using affine transformations, resulting in 1264 co-registered sets of IHC and corresponding multiplex images of
548
- size 512x512. We randomly selected 575 sets for training, 91 sets for validation, and 598 sets for testing the model.
549
- We also randomly selected and manually segmented 41 images of size 640x640 from recently released [BCDataset](https://sites.google.com/view/bcdataset)
550
- which contains Ki67 stained sections of breast carcinoma with Ki67+ and Ki67- cell centroid annotations (for cell
551
- detection rather than cell instance segmentation task). We split these tiles into 164 images of size 512x512; the test
552
- set varies widely in the density of tumor cells and the Ki67 index. You can find this dataset [here](https://zenodo.org/record/4751737#.YKRTS0NKhH4).
553
-
554
- We are also creating a self-configurable version of DeepLIIF which will take as input any co-registered H&E/IHC and
555
- multiplex images and produce the optimal output. If you are generating or have generated H&E/IHC and multiplex staining
556
- for the same slide (de novo staining) and would like to contribute that data for DeepLIIF, we can perform
557
- co-registration, whole-cell multiplex segmentation via [ImPartial](https://github.com/nadeemlab/ImPartial), train the
558
- DeepLIIF model and release back to the community with full credit to the contributors.
559
-
560
- - [x] **Memorial Sloan Kettering Cancer Center** [AI-ready immunohistochemistry and multiplex immunofluorescence dataset](https://zenodo.org/record/4751737#.YKRTS0NKhH4) for breast, lung, and bladder cancers (**Nature Machine Intelligence'22**)
561
- - [x] **Moffitt Cancer Center** [AI-ready multiplex immunofluorescence and multiplex immunohistochemistry dataset](https://wiki.cancerimagingarchive.net/pages/viewpage.action?pageId=70226184) for head-and-neck squamous cell carcinoma (**MICCAI'23**)
562
-
563
- ## Support
564
- Please use the [GitHub Issues](https://github.com/nadeemlab/DeepLIIF/issues) tab for discussion, questions, or to report bugs related to DeepLIIF.
565
-
566
- ## License
567
- © [Nadeem Lab](https://nadeemlab.org/) - DeepLIIF code is distributed under **Apache 2.0 with Commons Clause** license,
568
- and is available for non-commercial academic purposes.
569
-
570
- ## Acknowledgments
571
- This code is inspired by [CycleGAN and pix2pix in PyTorch](https://github.com/junyanz/pytorch-CycleGAN-and-pix2pix).
572
-
573
- ## Funding
574
- This work is funded by the 7-year NIH/NCI R37 MERIT Award ([R37CA295658](https://reporter.nih.gov/search/5dgSOlHosEKepkZEAS5_kQ/project-details/11018883#description)).
575
-
576
- ## Reference
577
- If you find our work useful in your research or if you use parts of this code or our released dataset, please cite the following papers:
578
- ```
579
- @article{ghahremani2022deep,
580
- title={Deep learning-inferred multiplex immunofluorescence for immunohistochemical image quantification},
581
- author={Ghahremani, Parmida and Li, Yanyun and Kaufman, Arie and Vanguri, Rami and Greenwald, Noah and Angelo, Michael and Hollmann, Travis J and Nadeem, Saad},
582
- journal={Nature Machine Intelligence},
583
- volume={4},
584
- number={4},
585
- pages={401--412},
586
- year={2022},
587
- publisher={Nature Publishing Group}
588
- }
589
-
590
- @article{ghahremani2022deepliifui,
591
- title={DeepLIIF: An Online Platform for Quantification of Clinical Pathology Slides},
592
- author={Ghahremani, Parmida and Marino, Joseph and Dodds, Ricardo and Nadeem, Saad},
593
- journal={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
594
- pages={21399--21405},
595
- year={2022}
596
- }
597
-
598
- @article{ghahremani2023deepliifdataset,
599
- title={An AI-Ready Multiplex Staining Dataset for Reproducible and Accurate Characterization of Tumor Immune Microenvironment},
600
- author={Ghahremani, Parmida and Marino, Joseph and Hernandez-Prera, Juan and V. de la Iglesia, Janis and JC Slebos, Robbert and H. Chung, Christine and Nadeem, Saad},
601
- journal={International Conference on Medical Image Computing and Computer-Assisted Intervention (MICCAI)},
602
- volume={14225},
603
- pages={704--713},
604
- year={2023}
605
- }
606
-
607
- @article{nadeem2023ki67validationMTC,
608
- author = {Nadeem, Saad and Hanna, Matthew G and Viswanathan, Kartik and Marino, Joseph and Ahadi, Mahsa and Alzumaili, Bayan and Bani, Mohamed-Amine and Chiarucci, Federico and Chou, Angela and De Leo, Antonio and Fuchs, Talia L and Lubin, Daniel J and Luxford, Catherine and Magliocca, Kelly and Martinez, Germán and Shi, Qiuying and Sidhu, Stan and Al Ghuzlan, Abir and Gill, Anthony J and Tallini, Giovanni and Ghossein, Ronald and Xu, Bin},
609
- title = {Ki67 proliferation index in medullary thyroid carcinoma: a comparative study of multiple counting methods and validation of image analysis and deep learning platforms},
610
- journal = {Histopathology},
611
- volume = {83},
612
- number = {6},
613
- pages = {981--988},
614
- year = {2023},
615
- doi = {https://doi.org/10.1111/his.15048}
616
- }
617
-
618
- @article{zehra2024deepliifstitch,
619
- author = {Zehra, Talat and Marino, Joseph and Wang, Wendy and Frantsuzov, Grigoriy and Nadeem, Saad},
620
- title = {Rethinking Histology Slide Digitization Workflows for Low-Resource Settings},
621
- journal = {International Conference on Medical Image Computing and Computer-Assisted Intervention (MICCAI)},
622
- volume = {15004},
623
- pages = {427--436},
624
- year = {2024}
625
- }
626
- ```
627
-
628
-
1
+ Metadata-Version: 2.1
2
+ Name: deepliif
3
+ Version: 1.2.1
4
+ Summary: DeepLIIF: Deep-Learning Inferred Multiplex Immunofluorescence for Immunohistochemical Image Quantification
5
+ Home-page: https://github.com/nadeemlab/DeepLIIF
6
+ Author: Parmida93
7
+ Author-email: ghahremani.parmida@gmail.com
8
+ Keywords: DeepLIIF,IHC,Segmentation,Classification
9
+ Description-Content-Type: text/markdown
10
+ License-File: LICENSE.md
11
+ Requires-Dist: opencv-python (==4.8.1.78)
12
+ Requires-Dist: torch (==1.13.1)
13
+ Requires-Dist: torchvision (==0.14.1)
14
+ Requires-Dist: scikit-image (==0.18.3)
15
+ Requires-Dist: dominate (==2.6.0)
16
+ Requires-Dist: numba (==0.57.1)
17
+ Requires-Dist: Click (==8.0.3)
18
+ Requires-Dist: requests (==2.32.2)
19
+ Requires-Dist: dask (==2021.11.2)
20
+ Requires-Dist: visdom (>=0.1.8.3)
21
+ Requires-Dist: python-bioformats (>=4.0.6)
22
+ Requires-Dist: imagecodecs (==2023.3.16)
23
+ Requires-Dist: zarr (==2.16.1)
24
+
25
+
26
+ <!-- PROJECT LOGO -->
27
+ <br />
28
+ <p align="center">
29
+ <img src="./images/DeepLIIF_logo.png" width="50%">
30
+ <h3 align="center"><strong>Deep-Learning Inferred Multiplex Immunofluorescence for Immunohistochemical Image Quantification</strong></h3>
31
+ <p align="center">
32
+ <a href="https://rdcu.be/cKSBz">Nature MI'22</a>
33
+ |
34
+ <a href="https://openaccess.thecvf.com/content/CVPR2022/html/Ghahremani_DeepLIIF_An_Online_Platform_for_Quantification_of_Clinical_Pathology_Slides_CVPR_2022_paper.html">CVPR'22</a>
35
+ |
36
+ <a href="https://arxiv.org/abs/2305.16465">MICCAI'23</a>
37
+ |
38
+ <a href="https://onlinelibrary.wiley.com/share/author/4AEBAGEHSZE9GDP3H8MN?target=10.1111/his.15048">Histopathology'23</a>
39
+ |
40
+ <a href="https://arxiv.org/abs/2405.08169">MICCAI'24</a>
41
+ |
42
+ <a href="https://deepliif.org/">Cloud Deployment</a>
43
+ |
44
+ <a href="https://nadeemlab.github.io/DeepLIIF/">Documentation</a>
45
+ |
46
+ <a href="#support">Support</a>
47
+ </p>
48
+ </p>
49
+
50
+ *Reporting biomarkers assessed by routine immunohistochemical (IHC) staining of tissue is broadly used in diagnostic
51
+ pathology laboratories for patient care. To date, clinical reporting is predominantly qualitative or semi-quantitative.
52
+ By creating a multitask deep learning framework referred to as DeepLIIF, we present a single-step solution to stain
53
+ deconvolution/separation, cell segmentation, and quantitative single-cell IHC scoring. Leveraging a unique de novo
54
+ dataset of co-registered IHC and multiplex immunofluorescence (mpIF) staining of the same slides, we segment and
55
+ translate low-cost and prevalent IHC slides to more expensive-yet-informative mpIF images, while simultaneously
56
+ providing the essential ground truth for the superimposed brightfield IHC channels. Moreover, a new nuclear-envelop
57
+ stain, LAP2beta, with high (>95%) cell coverage is introduced to improve cell delineation/segmentation and protein
58
+ expression quantification on IHC slides. By simultaneously translating input IHC images to clean/separated mpIF channels
59
+ and performing cell segmentation/classification, we show that our model trained on clean IHC Ki67 data can generalize to
60
+ more noisy and artifact-ridden images as well as other nuclear and non-nuclear markers such as CD3, CD8, BCL2, BCL6,
61
+ MYC, MUM1, CD10, and TP53. We thoroughly evaluate our method on publicly available benchmark datasets as well as against
62
+ pathologists' semi-quantitative scoring. Trained on IHC, DeepLIIF generalizes well to H&E images for out-of-the-box nuclear
63
+ segmentation.*
64
+
65
+ **DeepLIIF** is deployed as a free publicly available cloud-native platform (https://deepliif.org) with Bioformats (more than 150 input formats supported) and MLOps pipeline. We also release **DeepLIIF** implementations for single/multi-GPU training, Torchserve/Dask+Torchscript deployment, and auto-scaling via Pulumi (1000s of concurrent connections supported); details can be found in our [documentation](https://nadeemlab.github.io/DeepLIIF/). **DeepLIIF** can be run locally (GPU required) by [pip installing the package](https://github.com/nadeemlab/DeepLIIF/edit/main/README.md#installing-deepliif) and using the deepliif CLI command. **DeepLIIF** can be used remotely (no GPU required) through the https://deepliif.org website, calling the [cloud API via Python](https://github.com/nadeemlab/DeepLIIF/edit/main/README.md#cloud-deployment), or via the [ImageJ/Fiji plugin](https://github.com/nadeemlab/DeepLIIF/edit/main/README.md#imagej-plugin); details for the free cloud-native platform can be found in our [CVPR'22 paper](https://arxiv.org/pdf/2204.04494.pdf).
66
+
67
+ © This code is made available for non-commercial academic purposes.
68
+
69
+ ![Version](https://img.shields.io/static/v1?label=latest&message=v1.2.1&color=darkgreen)
70
+ [![Total Downloads](https://static.pepy.tech/personalized-badge/deepliif?period=total&units=international_system&left_color=grey&right_color=blue&left_text=total%20downloads)](https://pepy.tech/project/deepliif?&left_text=totalusers)
71
+
72
+ ![overview_image](./images/overview.png)*Overview of DeepLIIF pipeline and sample input IHCs (different
73
+ brown/DAB markers -- BCL2, BCL6, CD10, CD3/CD8, Ki67) with corresponding DeepLIIF-generated hematoxylin/mpIF modalities
74
+ and classified (positive (red) and negative (blue) cell) segmentation masks. (a) Overview of DeepLIIF. Given an IHC
75
+ input, our multitask deep learning framework simultaneously infers corresponding Hematoxylin channel, mpIF DAPI, mpIF
76
+ protein expression (Ki67, CD3, CD8, etc.), and the positive/negative protein cell segmentation, baking explainability
77
+ and interpretability into the model itself rather than relying on coarse activation/attention maps. In the segmentation
78
+ mask, the red cells denote cells with positive protein expression (brown/DAB cells in the input IHC), whereas blue cells
79
+ represent negative cells (blue cells in the input IHC). (b) Example DeepLIIF-generated hematoxylin/mpIF modalities and
80
+ segmentation masks for different IHC markers. DeepLIIF, trained on clean IHC Ki67 nuclear marker images, can generalize
81
+ to noisier as well as other IHC nuclear/cytoplasmic marker images.*
82
+
83
+ ## Prerequisites
84
+ 1. Python 3.8
85
+ 2. Docker
86
+
87
+ ## Installing `deepliif`
88
+
89
+ DeepLIIF can be `pip` installed:
90
+ ```shell
91
+ $ conda create --name deepliif_env python=3.8
92
+ $ conda activate deepliif_env
93
+ (deepliif_env) $ conda install -c conda-forge openjdk
94
+ (deepliif_env) $ pip install deepliif
95
+ ```
96
+
97
+ The package is composed of two parts:
98
+ 1. A library that implements the core functions used to train and test DeepLIIF models.
99
+ 2. A CLI to run common batch operations including training, batch testing and Torchscipt models serialization.
100
+
101
+ You can list all available commands:
102
+
103
+ ```
104
+ (venv) $ deepliif --help
105
+ Usage: deepliif [OPTIONS] COMMAND [ARGS]...
106
+
107
+ Options:
108
+ --help Show this message and exit.
109
+
110
+ Commands:
111
+ prepare-testing-data Preparing data for testing
112
+ serialize Serialize DeepLIIF models using Torchscript
113
+ test Test trained models
114
+ train General-purpose training script for multi-task...
115
+ ```
116
+
117
+ **Note:** You might need to install a version of PyTorch that is compatible with your CUDA version.
118
+ Otherwise, only the CPU will be used.
119
+ Visit the [PyTorch website](https://pytorch.org/) for details.
120
+ You can confirm if your installation will run on the GPU by checking if the following returns `True`:
121
+
122
+ ```
123
+ import torch
124
+ torch.cuda.is_available()
125
+ ```
126
+
127
+ ## Training Dataset
128
+ For training, all image sets must be 512x512 and combined together in 3072x512 images (six images of size 512x512 stitched
129
+ together horizontally).
130
+ The data need to be arranged in the following order:
131
+ ```
132
+ XXX_Dataset
133
+ ├── train
134
+ └── val
135
+ ```
136
+ We have provided a simple function in the CLI for preparing data for training.
137
+
138
+ * **To prepare data for training**, you need to have the image dataset for each image (including IHC, Hematoxylin Channel, mpIF DAPI, mpIF Lap2, mpIF marker, and segmentation mask) in the input directory.
139
+ Each of the six images for a single image set must have the same naming format, with only the name of the label for the type of image differing between them. The label names must be, respectively: IHC, Hematoxylin, DAPI, Lap2, Marker, Seg.
140
+ The command takes the address of the directory containing image set data and the address of the output dataset directory.
141
+ It first creates the train and validation directories inside the given output dataset directory.
142
+ It then reads all of the images in the input directory and saves the combined image in the train or validation directory, based on the given `validation_ratio`.
143
+ ```
144
+ deepliif prepare-training-data --input-dir /path/to/input/images
145
+ --output-dir /path/to/output/images
146
+ --validation-ratio 0.2
147
+ ```
148
+
149
+ ## Training
150
+ To train a model:
151
+ ```
152
+ deepliif train --dataroot /path/to/input/images
153
+ --name Model_Name
154
+ ```
155
+ or
156
+ ```
157
+ python train.py --dataroot /path/to/input/images
158
+ --name Model_Name
159
+ ```
160
+
161
+ * To view training losses and results, open the URL http://localhost:8097. For cloud servers replace localhost with your IP.
162
+ * Epoch-wise intermediate training results are in `DeepLIIF/checkpoints/Model_Name/web/index.html`.
163
+ * Trained models will be by default be saved in `DeepLIIF/checkpoints/Model_Name`.
164
+ * Training datasets can be downloaded [here](https://zenodo.org/record/4751737#.YKRTS0NKhH4).
165
+
166
+ **DP**: To train a model you can use DP. DP is single-process. It means that **all the GPUs you want to use must be on the same machine** so that they can be included in the same process - you cannot distribute the training across multiple GPU machines, unless you write your own code to handle inter-node (node = machine) communication.
167
+ To split and manage the workload for multiple GPUs within the same process, DP uses multi-threading.
168
+ You can find more information on DP [here](https://github.com/nadeemlab/DeepLIIF/blob/main/Multi-GPU%20Training.md).
169
+
170
+ To train a model with DP (Example with 2 GPUs (on 1 machine)):
171
+ ```
172
+ deepliif train --dataroot <data_dir> --batch-size 6 --gpu-ids 0 --gpu-ids 1
173
+ ```
174
+ Note that `batch-size` is defined per process. Since DP is a single-process method, the `batch-size` you set is the **effective** batch size.
175
+
176
+ **DDP**: To train a model you can use DDP. DDP usually spawns multiple processes.
177
+ **DeepLIIF's code follows the PyTorch recommendation to spawn 1 process per GPU** ([doc](https://github.com/pytorch/examples/blob/master/distributed/ddp/README.md#application-process-topologies)). If you want to assign multiple GPUs to each process, you will need to make modifications to DeepLIIF's code (see [doc](https://pytorch.org/tutorials/intermediate/ddp_tutorial.html#combine-ddp-with-model-parallelism)).
178
+ Despite all the benefits of DDP, one drawback is the extra GPU memory needed for dedicated CUDA buffer for communication. See a short discussion [here](https://discuss.pytorch.org/t/do-dataparallel-and-distributeddataparallel-affect-the-batch-size-and-gpu-memory-consumption/97194/2). In the context of DeepLIIF, this means that there might be situations where you could use a *bigger batch size with DP* as compared to DDP, which may actually train faster than using DDP with a smaller batch size.
179
+ You can find more information on DDP [here](https://github.com/nadeemlab/DeepLIIF/blob/main/Multi-GPU%20Training.md).
180
+
181
+ To launch training using DDP on a local machine, use `deepliif trainlaunch`. Example with 2 GPUs (on 1 machine):
182
+ ```
183
+ deepliif trainlaunch --dataroot <data_dir> --batch-size 3 --gpu-ids 0 --gpu-ids 1 --use-torchrun "--nproc_per_node 2"
184
+ ```
185
+ Note that
186
+ 1. `batch-size` is defined per process. Since DDP is a single-process method, the `batch-size` you set is the batch size for each process, and the **effective** batch size will be `batch-size` multiplied by the number of processes you started. In the above example, it will be 3 * 2 = 6.
187
+ 2. You still need to provide **all GPU ids to use** to the training command. Internally, in each process DeepLIIF picks the device using `gpu_ids[local_rank]`. If you provide `--gpu-ids 2 --gpu-ids 3`, the process with local rank 0 will use gpu id 2 and that with local rank 1 will use gpu id 3.
188
+ 3. `-t 3 --log_dir <log_dir>` is not required, but is a useful setting in `torchrun` that saves the log from each process to your target log directory. For example:
189
+ ```
190
+ deepliif trainlaunch --dataroot <data_dir> --batch-size 3 --gpu-ids 0 --gpu-ids 1 --use-torchrun "-t 3 --log_dir <log_dir> --nproc_per_node 2"
191
+ ```
192
+ 4. If your PyTorch is older than 1.10, DeepLIIF calls `torch.distributed.launch` in the backend. Otherwise, DeepLIIF calls `torchrun`.
193
+
194
+ ## Serialize Model
195
+ The installed `deepliif` uses Dask to perform inference on the input IHC images.
196
+ Before running the `test` command, the model files must be serialized using Torchscript.
197
+ To serialize the model files:
198
+ ```
199
+ deepliif serialize --model-dir /path/to/input/model/files
200
+ --output-dir /path/to/output/model/files
201
+ ```
202
+ * By default, the model files are expected to be located in `DeepLIIF/model-server/DeepLIIF_Latest_Model`.
203
+ * By default, the serialized files will be saved to the same directory as the input model files.
204
+
205
+ ## Testing
206
+ To test the model:
207
+ ```
208
+ deepliif test --input-dir /path/to/input/images
209
+ --output-dir /path/to/output/images
210
+ --model-dir /path/to/the/serialized/model
211
+ --tile-size 512
212
+ ```
213
+ or
214
+ ```
215
+ python test.py --dataroot /path/to/input/images
216
+ --results_dir /path/to/output/images
217
+ --checkpoints_dir /path/to/model/files
218
+ --name Model_Name
219
+ ```
220
+ * The latest version of the pretrained models can be downloaded [here](https://zenodo.org/record/4751737#.YKRTS0NKhH4).
221
+ * Before running test on images, the model files must be serialized as described above.
222
+ * The serialized model files are expected to be located in `DeepLIIF/model-server/DeepLIIF_Latest_Model`.
223
+ * The test results will be saved to the specified output directory, which defaults to the input directory.
224
+ * The tile size must be specified and is used to split the image into tiles for processing. The tile size is based on the resolution (scan magnification) of the input image, and the recommended values are a tile size of 512 for 40x images, 256 for 20x, and 128 for 10x. Note that the smaller the tile size, the longer inference will take.
225
+ * Testing datasets can be downloaded [here](https://zenodo.org/record/4751737#.YKRTS0NKhH4).
226
+
227
+ **Test Command Options:**
228
+ In addition to the required parameters given above, the following optional parameters are available for `deepliif test`:
229
+ * `--eager-mode` Run the original model files (instead of serialized model files).
230
+ * `--seg-intermediate` Save the intermediate segmentation maps created for each modality.
231
+ * `--seg-only` Save only the segmentation files, and do not infer images that are not needed.
232
+ * `--color-dapi` Color the inferred DAPI image.
233
+ * `--color-marker` Color the inferred marker image.
234
+
235
+ **Whole Slide Image (WSI) Inference:**
236
+ For translation and segmentation of whole slide images,
237
+ you can simply use the `test-wsi` command
238
+ giving path to the directory containing your WSI as the input-dir
239
+ and specifying the filename of the WSI.
240
+ DeepLIIF automatically reads the WSI region by region,
241
+ and translate and segment each region separately and stitches the regions
242
+ to create the translation and segmentation for whole slide image,
243
+ then saves all masks in the format of ome.tiff in the given output-dir.
244
+ Based on the available resources, the region-size can be changed.
245
+ ```
246
+ deepliif test-wsi --input-dir /path/to/input/image
247
+ --filename wsiFile.svs
248
+ --output-dir /path/to/output/images
249
+ --model-dir /path/to/the/serialized/model
250
+ --tile-size 512
251
+ ```
252
+
253
+ **WSI Inference Options:**
254
+ In addition to the required parameters given above, the following optional parameters are available for `deepliif test-wsi`:
255
+ * `--region-size` Set the size of each region to read from the WSI (default is 20000).
256
+ * `--seg-intermediate` Save the intermediate segmentation maps created for each modality.
257
+ * `--seg-only` Save only the segmentation files, and do not infer images that are not needed.
258
+ * `--color-dapi` Color the inferred DAPI image.
259
+ * `--color-marker` Color the inferred marker image.
260
+
261
+ **Reducing Run Time**
262
+ If you need only the final segmentation and not the inferred multiplex images,
263
+ it is recommended to run `deepliif test` or `deepliif test-wsi` with the `--seg-only`
264
+ option. This will generate only the necessary images, thus reducing the overall run time.
265
+
266
+ **Torchserve**
267
+ If you prefer, it is possible to run the models using Torchserve.
268
+ Please see [the documentation](https://nadeemlab.github.io/DeepLIIF/deployment/#deploying-deepliif-with-torchserve)
269
+ on how to deploy the model with Torchserve and for an example of how to run the inference.
270
+
271
+ ## Docker
272
+ We provide a Dockerfile that can be used to run the DeepLIIF models inside a container.
273
+ First, you need to install the [Docker Engine](https://docs.docker.com/engine/install/ubuntu/).
274
+ After installing the Docker, you need to follow these steps:
275
+ * Download the pretrained model [here](https://zenodo.org/record/4751737#.YKRTS0NKhH4) and place them in DeepLIIF/model-server/DeepLIIF_Latest_Model.
276
+ * To create a docker image from the docker file:
277
+ ```
278
+ docker build -t cuda/deepliif .
279
+ ```
280
+ The image is then used as a base. You can copy and use it to run an application. The application needs an isolated
281
+ environment in which to run, referred to as a container.
282
+ * To create and run a container:
283
+ ```
284
+ docker run -it -v `pwd`:`pwd` -w `pwd` cuda/deepliif deepliif test --input-dir Sample_Large_Tissues --tile-size 512
285
+ ```
286
+ When you run a container from the image, the `deepliif` CLI will be available.
287
+ You can easily run any CLI command in the activated environment and copy the results from the docker container to the host.
288
+
289
+ ## ImageJ Plugin
290
+ If you don't have access to GPU or appropriate hardware and just want to use ImageJ to run inference, we have also created an [ImageJ plugin](https://github.com/nadeemlab/DeepLIIF/tree/main/ImageJ_Plugin) for your convenience.
291
+
292
+ ![DeepLIIF ImageJ Demo](images/deepliif-imagej-demo.gif)
293
+
294
+ The plugin also supports submitting multiple ROIs at once:
295
+
296
+ ![DeepLIIF ImageJ ROI Demo](images/deepliif-imagej-roi-demo.gif)
297
+
298
+ ## Cloud Deployment
299
+ If you don't have access to GPU or appropriate hardware and don't want to install ImageJ, we have also created a [cloud-native DeepLIIF deployment](https://deepliif.org) with a user-friendly interface to upload images, visualize, interact, and download the final results.
300
+
301
+ ![DeepLIIF Website Demo](images/deepliif-website-demo-04.gif)
302
+
303
+ Our deployment at [deepliif.org](https://deepliif.org) also provides virtual slide digitization to generate a single stitched image from a 10x video acquired with a microscope and camera. The video should be captured with the following guidelines to achieve the best results:
304
+ * Brief but complete pauses at every section of the sample to avoid motion artifacts.
305
+ * Significant overlap between pauses so that there is sufficient context for stitching frames together.
306
+ * Methodical and consistent movement over the sample. For example, start at the top left corner, then go all the way to the right, then down one step, then all the way to the left, down one step, etc., until the end of the sample is reached. Again, brief overlapping pauses throughout will allow the best quality images to be generated.
307
+
308
+ ![DeepLIIF Website Demo](images/deepliif-stitch-demo-01.gif)
309
+
310
+ ## Cloud API Endpoints
311
+
312
+ For small images, DeepLIIF can also be accessed programmatically through an endpoint by posting a multipart-encoded request containing the original image file, along with optional parameters including postprocessing thresholds:
313
+
314
+ ```
315
+ POST /api/infer
316
+
317
+ File Parameter:
318
+
319
+ img (required)
320
+ Image on which to run DeepLIIF.
321
+
322
+ Query String Parameters:
323
+
324
+ resolution
325
+ Resolution used to scan the slide (10x, 20x, 40x). Default is 40x.
326
+
327
+ pil
328
+ If present, use Pillow to load the image instead of Bio-Formats. Pillow is
329
+ faster, but works only on common image types (png, jpeg, etc.).
330
+
331
+ slim
332
+ If present, return only the refined segmentation result image.
333
+
334
+ nopost
335
+ If present, do not perform postprocessing (returns only inferred images).
336
+
337
+ prob_thresh
338
+ Probability threshold used in postprocessing the inferred segmentation map
339
+ image. The segmentation map value must be above this value in order for a
340
+ pixel to be included in the final cell segmentation. Valid values are an
341
+ integer in the range 0-254. Default is 150.
342
+
343
+ size_thresh
344
+ Lower threshold for size gating the cells in postprocessing. Segmented
345
+ cells must have more pixels than this value in order to be included in the
346
+ final cell segmentation. Valid values are 0, a positive integer, or 'auto'.
347
+ 'Auto' will try to automatically determine this lower bound for size gating
348
+ based on the distribution of detected cell sizes. Default is 'auto'.
349
+
350
+ size_thresh_upper
351
+ Upper threshold for size gating the cells in postprocessing. Segmented
352
+ cells must have less pixels that this value in order to be included in the
353
+ final cell segmentation. Valid values are a positive integer or 'none'.
354
+ 'None' will use no upper threshold in size gating. Default is 'none'.
355
+
356
+ marker_thresh
357
+ Threshold for the effect that the inferred marker image will have on the
358
+ postprocessing classification of cells as positive. If any corresponding
359
+ pixel in the marker image for a cell is above this threshold, the cell will
360
+ be classified as being positive regardless of the values from the inferred
361
+ segmentation image. Valid values are an integer in the range 0-255, 'none',
362
+ or 'auto'. 'None' will not use the marker image during classification.
363
+ 'Auto' will automatically determine a threshold from the marker image.
364
+ Default is 'auto'.
365
+ ```
366
+
367
+ For example, in Python:
368
+
369
+ ```python
370
+ import os
371
+ import json
372
+ import base64
373
+ from io import BytesIO
374
+
375
+ import requests
376
+ from PIL import Image
377
+
378
+ # Use the sample images from the main DeepLIIF repo
379
+ images_dir = './Sample_Large_Tissues'
380
+ filename = 'ROI_1.png'
381
+
382
+ root = os.path.splitext(filename)[0]
383
+
384
+ res = requests.post(
385
+ url='https://deepliif.org/api/infer',
386
+ files={
387
+ 'img': open(f'{images_dir}/{filename}', 'rb'),
388
+ },
389
+ params={
390
+ 'resolution': '40x',
391
+ },
392
+ )
393
+
394
+ data = res.json()
395
+
396
+ def b64_to_pil(b):
397
+ return Image.open(BytesIO(base64.b64decode(b.encode())))
398
+
399
+ for name, img in data['images'].items():
400
+ with open(f'{images_dir}/{root}_{name}.png', 'wb') as f:
401
+ b64_to_pil(img).save(f, format='PNG')
402
+
403
+ with open(f'{images_dir}/{root}_scoring.json', 'w') as f:
404
+ json.dump(data['scoring'], f, indent=2)
405
+ print(json.dumps(data['scoring'], indent=2))
406
+ ```
407
+
408
+ Note that since this is a single request to send the image and receive the results, processing must complete within the timeout period (typically about one minute). If your request is receiving a 504 status code, please try a smaller image or install the `deepliif` package as detailed above to run the process locally.
409
+
410
+ If you have previously run DeepLIIF on an image and want to postprocess it with different thresholds, the postprocessing routine can be called directly using the previously inferred results:
411
+
412
+ ```
413
+ POST /api/postprocess
414
+
415
+ File Parameters:
416
+
417
+ img (required)
418
+ Image on which DeepLIIF was run.
419
+
420
+ seg_img (required)
421
+ Inferred segmentation image previously generated by DeepLIIF.
422
+
423
+ marker_img (optional)
424
+ Inferred marker image previously generated by DeepLIIF. If this is
425
+ omitted, then the marker image will not be used in classification.
426
+
427
+ Query String Parameters:
428
+
429
+ resolution
430
+ Resolution used to scan the slide (10x, 20x, 40x). Default is 40x.
431
+
432
+ pil
433
+ If present, use Pillow to load the original image instead of Bio-Formats.
434
+ Pillow is faster, but works only on common image types (png, jpeg, etc.).
435
+ Pillow is always used to open the seg_img and marker_img files.
436
+
437
+ prob_thresh
438
+ Probability threshold used in postprocessing the inferred segmentation map
439
+ image. The segmentation map value must be above this value in order for a
440
+ pixel to be included in the final cell segmentation. Valid values are an
441
+ integer in the range 0-254. Default is 150.
442
+
443
+ size_thresh
444
+ Lower threshold for size gating the cells in postprocessing. Segmented
445
+ cells must have more pixels than this value in order to be included in the
446
+ final cell segmentation. Valid values are 0, a positive integer, or 'auto'.
447
+ 'Auto' will try to automatically determine this lower bound for size gating
448
+ based on the distribution of detected cell sizes. Default is 'auto'.
449
+
450
+ size_thresh_upper
451
+ Upper threshold for size gating the cells in postprocessing. Segmented
452
+ cells must have less pixels that this value in order to be included in the
453
+ final cell segmentation. Valid values are a positive integer or 'none'.
454
+ 'None' will use no upper threshold in size gating. Default is 'none'.
455
+
456
+ marker_thresh
457
+ Threshold for the effect that the inferred marker image will have on the
458
+ postprocessing classification of cells as positive. If any corresponding
459
+ pixel in the marker image for a cell is above this threshold, the cell will
460
+ be classified as being positive regardless of the values from the inferred
461
+ segmentation image. Valid values are an integer in the range 0-255, 'none',
462
+ or 'auto'. 'None' will not use the marker image during classification.
463
+ 'Auto' will automatically determine a threshold from the marker image.
464
+ Default is 'auto'. (If marker_img is not supplied, this has no effect.)
465
+ ```
466
+
467
+ For example, in Python:
468
+
469
+ ```python
470
+ import os
471
+ import json
472
+ import base64
473
+ from io import BytesIO
474
+
475
+ import requests
476
+ from PIL import Image
477
+
478
+ # Use the sample images from the main DeepLIIF repo
479
+ images_dir = './Sample_Large_Tissues'
480
+ filename = 'ROI_1.png'
481
+
482
+ root = os.path.splitext(filename)[0]
483
+
484
+ res = requests.post(
485
+ url='https://deepliif.org/api/infer',
486
+ files={
487
+ 'img': open(f'{images_dir}/{filename}', 'rb'),
488
+ 'seg_img': open(f'{images_dir}/{root}_Seg.png', 'rb'),
489
+ 'marker_img': open(f'{images_dir}/{root}_Marker.png', 'rb'),
490
+ },
491
+ params={
492
+ 'resolution': '40x',
493
+ 'pil': True,
494
+ 'size_thresh': 250,
495
+ },
496
+ )
497
+
498
+ data = res.json()
499
+
500
+ def b64_to_pil(b):
501
+ return Image.open(BytesIO(base64.b64decode(b.encode())))
502
+
503
+ for name, img in data['images'].items():
504
+ with open(f'{images_dir}/{root}_{name}.png', 'wb') as f:
505
+ b64_to_pil(img).save(f, format='PNG')
506
+
507
+ with open(f'{images_dir}/{root}_scoring.json', 'w') as f:
508
+ json.dump(data['scoring'], f, indent=2)
509
+ print(json.dumps(data['scoring'], indent=2))
510
+ ```
511
+
512
+ ## Synthetic Data Generation
513
+ The first version of DeepLIIF model suffered from its inability to separate IHC positive cells in some large clusters,
514
+ resulting from the absence of clustered positive cells in our training data. To infuse more information about the
515
+ clustered positive cells into our model, we present a novel approach for the synthetic generation of IHC images using
516
+ co-registered data.
517
+ We design a GAN-based model that receives the Hematoxylin channel, the mpIF DAPI image, and the segmentation mask and
518
+ generates the corresponding IHC image. The model converts the Hematoxylin channel to gray-scale to infer more helpful
519
+ information such as the texture and discard unnecessary information such as color. The Hematoxylin image guides the
520
+ network to synthesize the background of the IHC image by preserving the shape and texture of the cells and artifacts in
521
+ the background. The DAPI image assists the network in identifying the location, shape, and texture of the cells to
522
+ better isolate the cells from the background. The segmentation mask helps the network specify the color of cells based
523
+ on the type of the cell (positive cell: a brown hue, negative: a blue hue).
524
+
525
+ In the next step, we generate synthetic IHC images with more clustered positive cells. To do so, we change the
526
+ segmentation mask by choosing a percentage of random negative cells in the segmentation mask (called as Neg-to-Pos) and
527
+ converting them into positive cells. Some samples of the synthesized IHC images along with the original IHC image are
528
+ shown below.
529
+
530
+ ![IHC_Gen_image](docs/training/images/IHC_Gen.jpg)*Overview of synthetic IHC image generation. (a) A training sample
531
+ of the IHC-generator model. (b) Some samples of synthesized IHC images using the trained IHC-Generator model. The
532
+ Neg-to-Pos shows the percentage of the negative cells in the segmentation mask converted to positive cells.*
533
+
534
+ We created a new dataset using the original IHC images and synthetic IHC images. We synthesize each image in the dataset
535
+ two times by setting the Neg-to-Pos parameter to %50 and %70. We re-trained our network with the new dataset. You can
536
+ find the new trained model [here](https://zenodo.org/record/4751737/files/DeepLIIF_Latest_Model.zip?download=1).
537
+
538
+ ## Registration
539
+ To register the de novo stained mpIF and IHC images, you can use the registration framework in the 'Registration'
540
+ directory. Please refer to the README file provided in the same directory for more details.
541
+
542
+ ## Contributing Training Data
543
+ To train DeepLIIF, we used a dataset of lung and bladder tissues containing IHC, hematoxylin, mpIF DAPI, mpIF Lap2, and
544
+ mpIF Ki67 of the same tissue scanned using ZEISS Axioscan. These images were scaled and co-registered with the fixed IHC
545
+ images using affine transformations, resulting in 1264 co-registered sets of IHC and corresponding multiplex images of
546
+ size 512x512. We randomly selected 575 sets for training, 91 sets for validation, and 598 sets for testing the model.
547
+ We also randomly selected and manually segmented 41 images of size 640x640 from recently released [BCDataset](https://sites.google.com/view/bcdataset)
548
+ which contains Ki67 stained sections of breast carcinoma with Ki67+ and Ki67- cell centroid annotations (for cell
549
+ detection rather than cell instance segmentation task). We split these tiles into 164 images of size 512x512; the test
550
+ set varies widely in the density of tumor cells and the Ki67 index. You can find this dataset [here](https://zenodo.org/record/4751737#.YKRTS0NKhH4).
551
+
552
+ We are also creating a self-configurable version of DeepLIIF which will take as input any co-registered H&E/IHC and
553
+ multiplex images and produce the optimal output. If you are generating or have generated H&E/IHC and multiplex staining
554
+ for the same slide (de novo staining) and would like to contribute that data for DeepLIIF, we can perform
555
+ co-registration, whole-cell multiplex segmentation via [ImPartial](https://github.com/nadeemlab/ImPartial), train the
556
+ DeepLIIF model and release back to the community with full credit to the contributors.
557
+
558
+ - [x] **Memorial Sloan Kettering Cancer Center** [AI-ready immunohistochemistry and multiplex immunofluorescence dataset](https://zenodo.org/record/4751737#.YKRTS0NKhH4) for breast, lung, and bladder cancers (**Nature Machine Intelligence'22**)
559
+ - [x] **Moffitt Cancer Center** [AI-ready multiplex immunofluorescence and multiplex immunohistochemistry dataset](https://wiki.cancerimagingarchive.net/pages/viewpage.action?pageId=70226184) for head-and-neck squamous cell carcinoma (**MICCAI'23**)
560
+
561
+ ## Support
562
+ Please use the [GitHub Issues](https://github.com/nadeemlab/DeepLIIF/issues) tab for discussion, questions, or to report bugs related to DeepLIIF.
563
+
564
+ ## License
565
+ © [Nadeem Lab](https://nadeemlab.org/) - DeepLIIF code is distributed under **Apache 2.0 with Commons Clause** license,
566
+ and is available for non-commercial academic purposes.
567
+
568
+ ## Acknowledgments
569
+ This code is inspired by [CycleGAN and pix2pix in PyTorch](https://github.com/junyanz/pytorch-CycleGAN-and-pix2pix).
570
+
571
+ ## Funding
572
+ This work is funded by the 7-year NIH/NCI R37 MERIT Award ([R37CA295658](https://reporter.nih.gov/search/5dgSOlHosEKepkZEAS5_kQ/project-details/11018883#description)).
573
+
574
+ ## Reference
575
+ If you find our work useful in your research or if you use parts of this code or our released dataset, please cite the following papers:
576
+ ```
577
+ @article{ghahremani2022deep,
578
+ title={Deep learning-inferred multiplex immunofluorescence for immunohistochemical image quantification},
579
+ author={Ghahremani, Parmida and Li, Yanyun and Kaufman, Arie and Vanguri, Rami and Greenwald, Noah and Angelo, Michael and Hollmann, Travis J and Nadeem, Saad},
580
+ journal={Nature Machine Intelligence},
581
+ volume={4},
582
+ number={4},
583
+ pages={401--412},
584
+ year={2022},
585
+ publisher={Nature Publishing Group}
586
+ }
587
+
588
+ @article{ghahremani2022deepliifui,
589
+ title={DeepLIIF: An Online Platform for Quantification of Clinical Pathology Slides},
590
+ author={Ghahremani, Parmida and Marino, Joseph and Dodds, Ricardo and Nadeem, Saad},
591
+ journal={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
592
+ pages={21399--21405},
593
+ year={2022}
594
+ }
595
+
596
+ @article{ghahremani2023deepliifdataset,
597
+ title={An AI-Ready Multiplex Staining Dataset for Reproducible and Accurate Characterization of Tumor Immune Microenvironment},
598
+ author={Ghahremani, Parmida and Marino, Joseph and Hernandez-Prera, Juan and V. de la Iglesia, Janis and JC Slebos, Robbert and H. Chung, Christine and Nadeem, Saad},
599
+ journal={International Conference on Medical Image Computing and Computer-Assisted Intervention (MICCAI)},
600
+ volume={14225},
601
+ pages={704--713},
602
+ year={2023}
603
+ }
604
+
605
+ @article{nadeem2023ki67validationMTC,
606
+ author = {Nadeem, Saad and Hanna, Matthew G and Viswanathan, Kartik and Marino, Joseph and Ahadi, Mahsa and Alzumaili, Bayan and Bani, Mohamed-Amine and Chiarucci, Federico and Chou, Angela and De Leo, Antonio and Fuchs, Talia L and Lubin, Daniel J and Luxford, Catherine and Magliocca, Kelly and Martinez, Germán and Shi, Qiuying and Sidhu, Stan and Al Ghuzlan, Abir and Gill, Anthony J and Tallini, Giovanni and Ghossein, Ronald and Xu, Bin},
607
+ title = {Ki67 proliferation index in medullary thyroid carcinoma: a comparative study of multiple counting methods and validation of image analysis and deep learning platforms},
608
+ journal = {Histopathology},
609
+ volume = {83},
610
+ number = {6},
611
+ pages = {981--988},
612
+ year = {2023},
613
+ doi = {https://doi.org/10.1111/his.15048}
614
+ }
615
+
616
+ @article{zehra2024deepliifstitch,
617
+ author = {Zehra, Talat and Marino, Joseph and Wang, Wendy and Frantsuzov, Grigoriy and Nadeem, Saad},
618
+ title = {Rethinking Histology Slide Digitization Workflows for Low-Resource Settings},
619
+ journal = {International Conference on Medical Image Computing and Computer-Assisted Intervention (MICCAI)},
620
+ volume = {15004},
621
+ pages = {427--436},
622
+ year = {2024}
623
+ }
624
+ ```