cellSP 0.0.1__tar.gz
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- cellsp-0.0.1/LICENSE +21 -0
- cellsp-0.0.1/PKG-INFO +137 -0
- cellsp-0.0.1/README.md +128 -0
- cellsp-0.0.1/cellSP/__init__.py +7 -0
- cellsp-0.0.1/cellSP/characterize/__init__.py +2 -0
- cellsp-0.0.1/cellSP/characterize/_bicluster.py +384 -0
- cellsp-0.0.1/cellSP/characterize/_instant.py +174 -0
- cellsp-0.0.1/cellSP/characterize/_sprawl.py +448 -0
- cellsp-0.0.1/cellSP/characterize/_utils.py +70 -0
- cellsp-0.0.1/cellSP/datasets/__init__.py +1 -0
- cellsp-0.0.1/cellSP/datasets/_datasets.py +123 -0
- cellsp-0.0.1/cellSP/geo/__init__.py +1 -0
- cellsp-0.0.1/cellSP/geo/_geo.py +134 -0
- cellsp-0.0.1/cellSP/io/__init__.py +1 -0
- cellsp-0.0.1/cellSP/io/_io.py +104 -0
- cellsp-0.0.1/cellSP/model/__init__.py +1 -0
- cellsp-0.0.1/cellSP/model/_model.py +215 -0
- cellsp-0.0.1/cellSP/preprocessing/__init__.py +2 -0
- cellsp-0.0.1/cellSP/preprocessing/_extrapolate.py +262 -0
- cellsp-0.0.1/cellSP/preprocessing/_impute.py +24 -0
- cellsp-0.0.1/cellSP/visualisation/__init__.py +5 -0
- cellsp-0.0.1/cellSP/visualisation/_circularize.py +458 -0
- cellsp-0.0.1/cellSP/visualisation/_enrichment.py +170 -0
- cellsp-0.0.1/cellSP/visualisation/_raw.py +56 -0
- cellsp-0.0.1/cellSP/visualisation/_report.py +258 -0
- cellsp-0.0.1/cellSP/visualisation/_validation.py +16 -0
- cellsp-0.0.1/cellSP.egg-info/PKG-INFO +137 -0
- cellsp-0.0.1/cellSP.egg-info/SOURCES.txt +30 -0
- cellsp-0.0.1/cellSP.egg-info/dependency_links.txt +1 -0
- cellsp-0.0.1/cellSP.egg-info/top_level.txt +3 -0
- cellsp-0.0.1/pyproject.toml +20 -0
- cellsp-0.0.1/setup.cfg +4 -0
cellsp-0.0.1/LICENSE
ADDED
|
@@ -0,0 +1,21 @@
|
|
|
1
|
+
MIT License
|
|
2
|
+
|
|
3
|
+
Copyright (c) 2024 Bhavay Aggarwal
|
|
4
|
+
|
|
5
|
+
Permission is hereby granted, free of charge, to any person obtaining a copy
|
|
6
|
+
of this software and associated documentation files (the "Software"), to deal
|
|
7
|
+
in the Software without restriction, including without limitation the rights
|
|
8
|
+
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
|
|
9
|
+
copies of the Software, and to permit persons to whom the Software is
|
|
10
|
+
furnished to do so, subject to the following conditions:
|
|
11
|
+
|
|
12
|
+
The above copyright notice and this permission notice shall be included in all
|
|
13
|
+
copies or substantial portions of the Software.
|
|
14
|
+
|
|
15
|
+
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
|
|
16
|
+
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
|
|
17
|
+
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
|
|
18
|
+
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
|
|
19
|
+
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
|
|
20
|
+
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
|
|
21
|
+
SOFTWARE.
|
cellsp-0.0.1/PKG-INFO
ADDED
|
@@ -0,0 +1,137 @@
|
|
|
1
|
+
Metadata-Version: 2.2
|
|
2
|
+
Name: cellSP
|
|
3
|
+
Version: 0.0.1
|
|
4
|
+
Summary: cellSP.
|
|
5
|
+
Author-email: Bhavay Aggarwal <bhavayaggarwal07@gmail.com>
|
|
6
|
+
Requires-Python: >=3.12
|
|
7
|
+
Description-Content-Type: text/markdown
|
|
8
|
+
License-File: LICENSE
|
|
9
|
+
|
|
10
|
+
# CellSP
|
|
11
|
+
__Note: Repository is work in progress__
|
|
12
|
+
<br>
|
|
13
|
+
<br>
|
|
14
|
+
CellSP is a python package for the analysis of subcellular spatial transcriptomic data. CellSP works with datasets generated at single-modulecule resolution from technologies like Xenium, CosMx, MERSCOPE or other ISH-like data. Using existing tools [InSTAnT](https://github.com/bhavaygg/InSTAnT) and [SPRAWL](https://github.com/salzman-lab/SPRAWL/), CellSP identifies statistically signficant subcellular patterns of gene transcripts and uses a biclustering algorithm to aggregate these patterns over hundereds of cells to produce "gene-cell modules". These modules represent the consistent detection of the same subcellular pattern by a set of genes in the same cells and offer a summarized and biologically interpretable desciption of subcellular patterns. CellSP provides specialized techniques for visualizing such modules and their defining spatial patterns. Additionally, CellSP utilize Gene Ontology (GO) enrichments tests to offer functionsal insights into the genes comprising the module as CellSPll as the cells comprising the module.
|
|
15
|
+
|
|
16
|
+

|
|
17
|
+
|
|
18
|
+
***
|
|
19
|
+
|
|
20
|
+
## How to install CellSP
|
|
21
|
+
|
|
22
|
+
CellSP recommend using our environment.yml file to create a new conda environment to avoid issues with package incompatibility.
|
|
23
|
+
|
|
24
|
+
```
|
|
25
|
+
conda env create -f environment.yml
|
|
26
|
+
```
|
|
27
|
+
This will create a new conda environment with the name `CellSP` and has all dependencies installed.
|
|
28
|
+
|
|
29
|
+
Alternatively, the package can be installed using pip.
|
|
30
|
+
|
|
31
|
+
```
|
|
32
|
+
pip install CellSP
|
|
33
|
+
```
|
|
34
|
+
|
|
35
|
+
__Note: Not operational as of now__
|
|
36
|
+
|
|
37
|
+
***
|
|
38
|
+
## How to use CellSP
|
|
39
|
+
|
|
40
|
+
CellSP expects data (both single cell and spatial transcriptomic) to be in AnnData format and can be loaded using
|
|
41
|
+
|
|
42
|
+
```
|
|
43
|
+
adata_sc, adata_st = cellSP.ds.load_data(sc_adata= 'files/adata_sc.h5ad', st_adata = "files/adata_st.h5ad")
|
|
44
|
+
```
|
|
45
|
+
|
|
46
|
+
**Note - Single cell data on the same tissue is required for characterization of the module cells.**
|
|
47
|
+
|
|
48
|
+
To load raw csv data, refer to [file]() for instructions.
|
|
49
|
+
|
|
50
|
+
CellSP preprocess the input single cell data by performing denoising using [MAGIC](https://github.com/KrishnaswamyLab/MAGIC) and impute the expression of genes not in the ST panel using [Tangram](https://github.com/broadinstitute/Tangram/).
|
|
51
|
+
|
|
52
|
+
```
|
|
53
|
+
adata_sc = cellSP.pp.impute(adata_sc, t="auto")
|
|
54
|
+
adata_st = cellSP.pp.run_tangram(adata_sc, adata_st, device='cpu')
|
|
55
|
+
```
|
|
56
|
+
|
|
57
|
+
After Tangram imputation, the single cell and spatial Anndata objects are combined into one. This completes the preprocessing required for using CellSP. This can be skipped if cellular characterization is not required.
|
|
58
|
+
|
|
59
|
+
There are three main steps involved in running CellSP -
|
|
60
|
+
1. Subcellular Pattern Discovery
|
|
61
|
+
2. Module Discovery
|
|
62
|
+
3. Module Characterization
|
|
63
|
+
|
|
64
|
+
### Subcellular Pattern Discovery
|
|
65
|
+
|
|
66
|
+
CellSP uses InSTAnT and SPRAWL for identifying statistically significant subcellular patterns. InSTAnT tests if transcripts of a gene pair tend to be proximal to each other more often than expected by chance, while SPRAWL identifies four types of subcellular patterns – peripheral, radial, punctate and central – describing the distribution of a gene’s transcripts within the cell.
|
|
67
|
+
|
|
68
|
+
To run InSTAnT, CellSP has two primary parameters -
|
|
69
|
+
- `distance_threshold`: The distance (in microns) at which to consider 2 genes proximal.
|
|
70
|
+
- `alpha_cpb`: p-value signifiance threshold below which a gene-pair is considered colocalized for the CPB test. Default = 1e-3
|
|
71
|
+
|
|
72
|
+
```
|
|
73
|
+
adata_st = cellSP.ch.run_instant(adata_st = adata_st, distance_threshold=2, alpha_cpb=1e-5)
|
|
74
|
+
```
|
|
75
|
+
|
|
76
|
+
To run SPRAWL, CellSP uses the default parameters from the original implementation.
|
|
77
|
+
|
|
78
|
+
```
|
|
79
|
+
adata_st = cellSP.ch.run_sprawl(adata_st)
|
|
80
|
+
```
|
|
81
|
+
|
|
82
|
+
### Module Discovery
|
|
83
|
+
|
|
84
|
+
CellSP use a biclustering tool, LAS, to analyze each of the patterns and identiy "gene-cell modules". Each module represents a set of genes or gene pairs that exhibit the same type of sub-cellular pattern in the same set of cells, with statistical significance estimated by a Bonferroni-based score.
|
|
85
|
+
|
|
86
|
+
CellSP has 2 functions for module discovery, one for SPRAWL and one for InSTAnT. Both the functions share the same parameters but the InSTAnt function has two additional parameter
|
|
87
|
+
- `alpha`: p-value signifiance threshold below which a gene-pair is considered for biclustering. Default = 1e-3
|
|
88
|
+
- `topk`: Select only the K most significant gene pairs that have p-value < `alpha`. Default = None
|
|
89
|
+
|
|
90
|
+
These parameters is used the restrict the number of gene-pairs over which biclustering is performed in order to reduce the computational complexity.
|
|
91
|
+
|
|
92
|
+
The other parameters used are -
|
|
93
|
+
- `num_biclusters`: Number of modules to find. Default = 10.
|
|
94
|
+
- `randomized_searches`: Number of randomized searches to perform in LAS. Default = 50000.
|
|
95
|
+
|
|
96
|
+
```
|
|
97
|
+
adata_st = cellSP.ch.bicluster_instant(adata_st, distance_threshold=2, threads=128, alpha=1e-5, num_biclusters = 50, randomized_searches = 50000)
|
|
98
|
+
adata_st = cellSP.ch.bicluster_sprawl(adata_st, threads=128, num_biclusters = 50, randomized_searches = 50000)
|
|
99
|
+
```
|
|
100
|
+
|
|
101
|
+
### Module Characterization
|
|
102
|
+
|
|
103
|
+
To aid biological interpretation, CellSP reports shared properties of the genes and cells of each discovered module. Genes are characterized using Gene Ontology (GO) enrichment tests, while cells are characterized by their cell type composition if such information is available. To provide a more precise characterization of a module’s cells, CellSP trains a machine learning classifier to discriminate those cells from all other cells, using the expression levels of all genes other than the module genes. Genes that are highly predictive in this task are then subjected to GO enrichment tests, furnishing hypotheses about biological processes and pathways that are active specifically in the module cells.
|
|
104
|
+
|
|
105
|
+
To characterize the module genes -
|
|
106
|
+
|
|
107
|
+
```
|
|
108
|
+
adata_st = cellSP.geo.geo_analysis(adata_st, setting="module")
|
|
109
|
+
```
|
|
110
|
+
|
|
111
|
+
|
|
112
|
+
To characterize the module cells, we first train a random forest classifier to find genes that are predictive of module presence and then perform enrichment tests -
|
|
113
|
+
```
|
|
114
|
+
adata_st = cellSP.md.model_modules(adata_st, do_shap=True, subsample = True)
|
|
115
|
+
adata_st = cellSP.geo.geo_analysis(adata_st, setting="cell")
|
|
116
|
+
```
|
|
117
|
+
|
|
118
|
+
### Visualization
|
|
119
|
+
|
|
120
|
+
To help visualize modules defined by the five types of subcellular spatial patterns (four types identified by SPRAWL and colocalization patterns identified by InSTAnT), we developed three complementary plotting techniques.
|
|
121
|
+
|
|
122
|
+

|
|
123
|
+
|
|
124
|
+
***
|
|
125
|
+
|
|
126
|
+
### How to cite CellSP
|
|
127
|
+
|
|
128
|
+
```
|
|
129
|
+
@article{aggarwal2025cellsp,
|
|
130
|
+
title={CellSP: Module discovery and visualization for subcellular spatial transcriptomics data},
|
|
131
|
+
author={Aggarwal, Bhavay and Sinha, Saurabh},
|
|
132
|
+
journal={bioRxiv},
|
|
133
|
+
pages={2025--01},
|
|
134
|
+
year={2025},
|
|
135
|
+
publisher={Cold Spring Harbor Laboratory}
|
|
136
|
+
}
|
|
137
|
+
```
|
cellsp-0.0.1/README.md
ADDED
|
@@ -0,0 +1,128 @@
|
|
|
1
|
+
# CellSP
|
|
2
|
+
__Note: Repository is work in progress__
|
|
3
|
+
<br>
|
|
4
|
+
<br>
|
|
5
|
+
CellSP is a python package for the analysis of subcellular spatial transcriptomic data. CellSP works with datasets generated at single-modulecule resolution from technologies like Xenium, CosMx, MERSCOPE or other ISH-like data. Using existing tools [InSTAnT](https://github.com/bhavaygg/InSTAnT) and [SPRAWL](https://github.com/salzman-lab/SPRAWL/), CellSP identifies statistically signficant subcellular patterns of gene transcripts and uses a biclustering algorithm to aggregate these patterns over hundereds of cells to produce "gene-cell modules". These modules represent the consistent detection of the same subcellular pattern by a set of genes in the same cells and offer a summarized and biologically interpretable desciption of subcellular patterns. CellSP provides specialized techniques for visualizing such modules and their defining spatial patterns. Additionally, CellSP utilize Gene Ontology (GO) enrichments tests to offer functionsal insights into the genes comprising the module as CellSPll as the cells comprising the module.
|
|
6
|
+
|
|
7
|
+

|
|
8
|
+
|
|
9
|
+
***
|
|
10
|
+
|
|
11
|
+
## How to install CellSP
|
|
12
|
+
|
|
13
|
+
CellSP recommend using our environment.yml file to create a new conda environment to avoid issues with package incompatibility.
|
|
14
|
+
|
|
15
|
+
```
|
|
16
|
+
conda env create -f environment.yml
|
|
17
|
+
```
|
|
18
|
+
This will create a new conda environment with the name `CellSP` and has all dependencies installed.
|
|
19
|
+
|
|
20
|
+
Alternatively, the package can be installed using pip.
|
|
21
|
+
|
|
22
|
+
```
|
|
23
|
+
pip install CellSP
|
|
24
|
+
```
|
|
25
|
+
|
|
26
|
+
__Note: Not operational as of now__
|
|
27
|
+
|
|
28
|
+
***
|
|
29
|
+
## How to use CellSP
|
|
30
|
+
|
|
31
|
+
CellSP expects data (both single cell and spatial transcriptomic) to be in AnnData format and can be loaded using
|
|
32
|
+
|
|
33
|
+
```
|
|
34
|
+
adata_sc, adata_st = cellSP.ds.load_data(sc_adata= 'files/adata_sc.h5ad', st_adata = "files/adata_st.h5ad")
|
|
35
|
+
```
|
|
36
|
+
|
|
37
|
+
**Note - Single cell data on the same tissue is required for characterization of the module cells.**
|
|
38
|
+
|
|
39
|
+
To load raw csv data, refer to [file]() for instructions.
|
|
40
|
+
|
|
41
|
+
CellSP preprocess the input single cell data by performing denoising using [MAGIC](https://github.com/KrishnaswamyLab/MAGIC) and impute the expression of genes not in the ST panel using [Tangram](https://github.com/broadinstitute/Tangram/).
|
|
42
|
+
|
|
43
|
+
```
|
|
44
|
+
adata_sc = cellSP.pp.impute(adata_sc, t="auto")
|
|
45
|
+
adata_st = cellSP.pp.run_tangram(adata_sc, adata_st, device='cpu')
|
|
46
|
+
```
|
|
47
|
+
|
|
48
|
+
After Tangram imputation, the single cell and spatial Anndata objects are combined into one. This completes the preprocessing required for using CellSP. This can be skipped if cellular characterization is not required.
|
|
49
|
+
|
|
50
|
+
There are three main steps involved in running CellSP -
|
|
51
|
+
1. Subcellular Pattern Discovery
|
|
52
|
+
2. Module Discovery
|
|
53
|
+
3. Module Characterization
|
|
54
|
+
|
|
55
|
+
### Subcellular Pattern Discovery
|
|
56
|
+
|
|
57
|
+
CellSP uses InSTAnT and SPRAWL for identifying statistically significant subcellular patterns. InSTAnT tests if transcripts of a gene pair tend to be proximal to each other more often than expected by chance, while SPRAWL identifies four types of subcellular patterns – peripheral, radial, punctate and central – describing the distribution of a gene’s transcripts within the cell.
|
|
58
|
+
|
|
59
|
+
To run InSTAnT, CellSP has two primary parameters -
|
|
60
|
+
- `distance_threshold`: The distance (in microns) at which to consider 2 genes proximal.
|
|
61
|
+
- `alpha_cpb`: p-value signifiance threshold below which a gene-pair is considered colocalized for the CPB test. Default = 1e-3
|
|
62
|
+
|
|
63
|
+
```
|
|
64
|
+
adata_st = cellSP.ch.run_instant(adata_st = adata_st, distance_threshold=2, alpha_cpb=1e-5)
|
|
65
|
+
```
|
|
66
|
+
|
|
67
|
+
To run SPRAWL, CellSP uses the default parameters from the original implementation.
|
|
68
|
+
|
|
69
|
+
```
|
|
70
|
+
adata_st = cellSP.ch.run_sprawl(adata_st)
|
|
71
|
+
```
|
|
72
|
+
|
|
73
|
+
### Module Discovery
|
|
74
|
+
|
|
75
|
+
CellSP use a biclustering tool, LAS, to analyze each of the patterns and identiy "gene-cell modules". Each module represents a set of genes or gene pairs that exhibit the same type of sub-cellular pattern in the same set of cells, with statistical significance estimated by a Bonferroni-based score.
|
|
76
|
+
|
|
77
|
+
CellSP has 2 functions for module discovery, one for SPRAWL and one for InSTAnT. Both the functions share the same parameters but the InSTAnt function has two additional parameter
|
|
78
|
+
- `alpha`: p-value signifiance threshold below which a gene-pair is considered for biclustering. Default = 1e-3
|
|
79
|
+
- `topk`: Select only the K most significant gene pairs that have p-value < `alpha`. Default = None
|
|
80
|
+
|
|
81
|
+
These parameters is used the restrict the number of gene-pairs over which biclustering is performed in order to reduce the computational complexity.
|
|
82
|
+
|
|
83
|
+
The other parameters used are -
|
|
84
|
+
- `num_biclusters`: Number of modules to find. Default = 10.
|
|
85
|
+
- `randomized_searches`: Number of randomized searches to perform in LAS. Default = 50000.
|
|
86
|
+
|
|
87
|
+
```
|
|
88
|
+
adata_st = cellSP.ch.bicluster_instant(adata_st, distance_threshold=2, threads=128, alpha=1e-5, num_biclusters = 50, randomized_searches = 50000)
|
|
89
|
+
adata_st = cellSP.ch.bicluster_sprawl(adata_st, threads=128, num_biclusters = 50, randomized_searches = 50000)
|
|
90
|
+
```
|
|
91
|
+
|
|
92
|
+
### Module Characterization
|
|
93
|
+
|
|
94
|
+
To aid biological interpretation, CellSP reports shared properties of the genes and cells of each discovered module. Genes are characterized using Gene Ontology (GO) enrichment tests, while cells are characterized by their cell type composition if such information is available. To provide a more precise characterization of a module’s cells, CellSP trains a machine learning classifier to discriminate those cells from all other cells, using the expression levels of all genes other than the module genes. Genes that are highly predictive in this task are then subjected to GO enrichment tests, furnishing hypotheses about biological processes and pathways that are active specifically in the module cells.
|
|
95
|
+
|
|
96
|
+
To characterize the module genes -
|
|
97
|
+
|
|
98
|
+
```
|
|
99
|
+
adata_st = cellSP.geo.geo_analysis(adata_st, setting="module")
|
|
100
|
+
```
|
|
101
|
+
|
|
102
|
+
|
|
103
|
+
To characterize the module cells, we first train a random forest classifier to find genes that are predictive of module presence and then perform enrichment tests -
|
|
104
|
+
```
|
|
105
|
+
adata_st = cellSP.md.model_modules(adata_st, do_shap=True, subsample = True)
|
|
106
|
+
adata_st = cellSP.geo.geo_analysis(adata_st, setting="cell")
|
|
107
|
+
```
|
|
108
|
+
|
|
109
|
+
### Visualization
|
|
110
|
+
|
|
111
|
+
To help visualize modules defined by the five types of subcellular spatial patterns (four types identified by SPRAWL and colocalization patterns identified by InSTAnT), we developed three complementary plotting techniques.
|
|
112
|
+
|
|
113
|
+

|
|
114
|
+
|
|
115
|
+
***
|
|
116
|
+
|
|
117
|
+
### How to cite CellSP
|
|
118
|
+
|
|
119
|
+
```
|
|
120
|
+
@article{aggarwal2025cellsp,
|
|
121
|
+
title={CellSP: Module discovery and visualization for subcellular spatial transcriptomics data},
|
|
122
|
+
author={Aggarwal, Bhavay and Sinha, Saurabh},
|
|
123
|
+
journal={bioRxiv},
|
|
124
|
+
pages={2025--01},
|
|
125
|
+
year={2025},
|
|
126
|
+
publisher={Cold Spring Harbor Laboratory}
|
|
127
|
+
}
|
|
128
|
+
```
|