spatialformer 0.0.6__tar.gz
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- spatialformer-0.0.6/LICENSE +21 -0
- spatialformer-0.0.6/PKG-INFO +11 -0
- spatialformer-0.0.6/README.md +140 -0
- spatialformer-0.0.6/setup.cfg +4 -0
- spatialformer-0.0.6/setup.py +22 -0
- spatialformer-0.0.6/spatialformer/GraphSAGE.py +765 -0
- spatialformer-0.0.6/spatialformer/__init__.py +23 -0
- spatialformer-0.0.6/spatialformer/data_loader.py +499 -0
- spatialformer-0.0.6/spatialformer/graphsage.py +362 -0
- spatialformer-0.0.6/spatialformer/processor.py +107 -0
- spatialformer-0.0.6/spatialformer/tools/__init__.py +2 -0
- spatialformer-0.0.6/spatialformer/tools/get_embeddings.py +404 -0
- spatialformer-0.0.6/spatialformer/train.py +234 -0
- spatialformer-0.0.6/spatialformer/utils.py +947 -0
- spatialformer-0.0.6/spatialformer.egg-info/PKG-INFO +11 -0
- spatialformer-0.0.6/spatialformer.egg-info/SOURCES.txt +17 -0
- spatialformer-0.0.6/spatialformer.egg-info/dependency_links.txt +1 -0
- spatialformer-0.0.6/spatialformer.egg-info/requires.txt +27 -0
- spatialformer-0.0.6/spatialformer.egg-info/top_level.txt +1 -0
|
@@ -0,0 +1,21 @@
|
|
|
1
|
+
MIT License
|
|
2
|
+
|
|
3
|
+
Copyright (c) 2025 Jun Wang
|
|
4
|
+
|
|
5
|
+
Permission is hereby granted, free of charge, to any person obtaining a copy
|
|
6
|
+
of this software and associated documentation files (the "Software"), to deal
|
|
7
|
+
in the Software without restriction, including without limitation the rights
|
|
8
|
+
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
|
|
9
|
+
copies of the Software, and to permit persons to whom the Software is
|
|
10
|
+
furnished to do so, subject to the following conditions:
|
|
11
|
+
|
|
12
|
+
The above copyright notice and this permission notice shall be included in all
|
|
13
|
+
copies or substantial portions of the Software.
|
|
14
|
+
|
|
15
|
+
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
|
|
16
|
+
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
|
|
17
|
+
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
|
|
18
|
+
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
|
|
19
|
+
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
|
|
20
|
+
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
|
|
21
|
+
SOFTWARE.
|
|
@@ -0,0 +1,11 @@
|
|
|
1
|
+
Metadata-Version: 2.1
|
|
2
|
+
Name: spatialformer
|
|
3
|
+
Version: 0.0.6
|
|
4
|
+
Summary: A single-cell foundation model focus on the spatial cell-cell colocalization
|
|
5
|
+
Home-page: https://github.com/TerminatorJ/Spatialformer/
|
|
6
|
+
Author: TerminatorJ
|
|
7
|
+
Author-email: wangjun19950708@gmail.com
|
|
8
|
+
Classifier: Programming Language :: Python :: 3
|
|
9
|
+
Classifier: License :: OSI Approved :: MIT License
|
|
10
|
+
Requires-Python: >=3.9
|
|
11
|
+
License-File: LICENSE
|
|
@@ -0,0 +1,140 @@
|
|
|
1
|
+
---
|
|
2
|
+
|
|
3
|
+
This is the official codebase for the SpatialFormer, the first single cell spatial foundation model to learn the universal representation (subcellular molecular & cellular spatial proximity) by multi-tasks learning.
|
|
4
|
+
|
|
5
|
+
[](https://github.com/username/repo/blob/main/LICENSE)
|
|
6
|
+
|
|
7
|
+

|
|
8
|
+
|
|
9
|
+
## Overview
|
|
10
|
+
Spatial transcriptomics quantifies gene expression within its spatial context, significantly advancing biomedical research. Understanding gene spatial expression and the organization of multicellular systems is vital for disease diagnosis and studying biological processes. However, existing models often struggle to integrate gene expression data with cellular spatial information effectively. In this study, we introduce SpatialFormer, a hybrid framework combining convolutional networks and transformers to learn single-cell multimodal and multi-scale information in the niche context, including expression data and subcellular gene spatial distribution. Pre-trained on 300 million cell pairs from 12 million spatially resolved single cells across 62 Xenium slides, SpatialFormer merges gene spatial expression profiles with cell niche information via the pair-wise training strategy. Our findings demonstrate that SpatialFormer distills biological signals across various tasks, including single-cell batch correction, cell-type annotation, co-localization detection, and identifying gene pairs that are critical for the immune cell-cell interactions involved in the regulation of lung fibrosis. These advancements enhance our understanding of cellular dynamics and offer new pathways for applications in biomedical research.
|
|
11
|
+
|
|
12
|
+
|
|
13
|
+
## System Requirements
|
|
14
|
+
### Hardware requirements
|
|
15
|
+
We provide the GPU and CPU version for users with different device levels. However, if a large scale of cells need to be calculated, the GPUs is mandatory to get the results effeciently. When using GPUs, AMD and IVIDIA GPUs are all supported.
|
|
16
|
+
### Software requirements
|
|
17
|
+
#### OS requirements
|
|
18
|
+
This package is supported for macOS and Linux. The package has been tested on the following systems:
|
|
19
|
+
- macOS: Sequoia (15.3.1)
|
|
20
|
+
- Linux: Ubuntu 16.04; SLES 15.5
|
|
21
|
+
|
|
22
|
+
#### Python environment requirements
|
|
23
|
+
Create the spatialformer environment by anaconda (python >= 3.9 required)
|
|
24
|
+
```bash
|
|
25
|
+
conda create -n spatialformer python=3.9
|
|
26
|
+
```
|
|
27
|
+
|
|
28
|
+
## Installation
|
|
29
|
+
|
|
30
|
+
### Install from PyPi
|
|
31
|
+
If you are using the AMD gpus
|
|
32
|
+
```bash
|
|
33
|
+
pip install SpatialFormer --extra-index-url https://download.pytorch.org/whl/rocm6.0
|
|
34
|
+
```
|
|
35
|
+
Alternatively, if you are using the NVDIA gpus
|
|
36
|
+
```bash
|
|
37
|
+
pip install SpatialFormer --extra-index-url https://download.pytorch.org/whl/cu121
|
|
38
|
+
```
|
|
39
|
+
|
|
40
|
+
### Install from Github
|
|
41
|
+
```bash
|
|
42
|
+
git clone https://github.com/TerminatorJ/Spatialformer/
|
|
43
|
+
cd Spatialformer
|
|
44
|
+
```
|
|
45
|
+
if you are using the AMD gpus
|
|
46
|
+
```bash
|
|
47
|
+
pip install -e --extra-index-url https://download.pytorch.org/whl/rocm6.0
|
|
48
|
+
```
|
|
49
|
+
whereas, if you are using the NVIDIA gpus
|
|
50
|
+
```bash
|
|
51
|
+
pip install -e --extra-index-url https://download.pytorch.org/whl/cu121
|
|
52
|
+
```
|
|
53
|
+
|
|
54
|
+
|
|
55
|
+
|
|
56
|
+
|
|
57
|
+
## Pretraining data
|
|
58
|
+
|
|
59
|
+
The model is capable of handling input from individual cells and doublets. It was originally pretrained on a large-scale dataset of pairwise doublets with both positive and negative characteristics. Specifically, the positive pairs consist of all cells located within the niches of a certain query cell. In contrast, the negative pairs can include any distant cells that are either far away from the query cell or just one hop away from their corresponding positive pairs. Notably, the positive and negative pairs are maintained at a 1:1 ratio to ensure that each training batch contains 50% positive and 50% negative examples.
|
|
60
|
+
|
|
61
|
+
The processed individual cell dataset can be retrieved from the Hugging Face dataset repository at [SpatialCC-12M](https://huggingface.co/datasets/TerminatorJ/xenium_pandavid_dataset). The pairwise data can be generated by following the instructions provided in `/data_preprocess/`.
|
|
62
|
+
|
|
63
|
+
You can easily download the dataset in python as below
|
|
64
|
+
```python
|
|
65
|
+
from datasets import load_dataset
|
|
66
|
+
spatialcc = load_dataset("TerminatorJ/xenium_pandavid_dataset4", cache_dir = "your_cache_dir")
|
|
67
|
+
```
|
|
68
|
+
Alternatively, the original pretraining dataset can be downloaded from figshare (DOI https://doi.org/10.6084/m9.figshare.28436606.v1) [SpatialCC-12M](https://figshare.com/articles/dataset/SpatialCC-12M/28436606?file=52449404).
|
|
69
|
+
|
|
70
|
+
|
|
71
|
+
|
|
72
|
+
## Get the Embeddings
|
|
73
|
+
|
|
74
|
+
SpatialFormer provides a simple function to extract embeddings. By using the `sp.tl.embed()` function, we can seamlessly integrate with the AnnData object, meaning the generated embeddings will be stored in `obsm` under the key `"X_SpaF"`.
|
|
75
|
+
|
|
76
|
+
SpatialFormer supports two methods for generating embeddings: 1) single input mode and 2) pairwise input mode. Below is an example of generating the AnnData embeddings:
|
|
77
|
+
|
|
78
|
+
A simple example anndata can be downloaded [here](downstream/cell_cell_communication/data/covid_subsampled.h5ad)
|
|
79
|
+
|
|
80
|
+
|
|
81
|
+
#### Loading the anndata
|
|
82
|
+
|
|
83
|
+
```python
|
|
84
|
+
import scanpy as sc
|
|
85
|
+
adata = sc.read_h5ad("./downstream/cell_cell_communication/data/covid_subsampled.h5ad")
|
|
86
|
+
```
|
|
87
|
+
|
|
88
|
+
|
|
89
|
+
#### Single Input Mode
|
|
90
|
+
```python
|
|
91
|
+
import SpatialFormer as sp
|
|
92
|
+
embed_adata = sp.tl.embed_data(adata,
|
|
93
|
+
tissue,
|
|
94
|
+
condition,
|
|
95
|
+
method,
|
|
96
|
+
model_ckp_path,
|
|
97
|
+
batch_size,
|
|
98
|
+
mode = "single",
|
|
99
|
+
threshold = 0.7,
|
|
100
|
+
num_workers = 8
|
|
101
|
+
)
|
|
102
|
+
```
|
|
103
|
+
#### Pairwise Input Mode
|
|
104
|
+
```python
|
|
105
|
+
embed_adata = sp.tl.embed_data(adata,
|
|
106
|
+
tissue,
|
|
107
|
+
condition,
|
|
108
|
+
method,
|
|
109
|
+
model_ckp_path,
|
|
110
|
+
batch_size,
|
|
111
|
+
mode = "pair",
|
|
112
|
+
left_cell = ["aacid_0789", "aacid_0799"],
|
|
113
|
+
right_cell = ["aacid_0635", "aacid_0652"],
|
|
114
|
+
num_workers = 8
|
|
115
|
+
)
|
|
116
|
+
```
|
|
117
|
+
|
|
118
|
+
| Arguments | dtype |Description |
|
|
119
|
+
| :------------------------ | :--------- | :--------- |
|
|
120
|
+
| adata | object | An AnnData object that stores expression information by CellXGene.|
|
|
121
|
+
| tissue | string | The type of tissue (e.g., Breast/Lung).|
|
|
122
|
+
| condition | string | Metadata for the sample condition (e.g., Disease/Healthy). |
|
|
123
|
+
| method | string | The method of the embed function, which can be either "single" or "pair." The single mode collates only individual cells as input for the model. In "pair" mode, data is prepared for pairwise input. If using "pair," both left_cell and right_cell must be provided, and their lengths must be the same. Each cell ID in left_cell corresponds to the cell ID at the same index in right_cell. |
|
|
124
|
+
| model_ckp_path | string | The path to the SpatialFormer model checkpoint.|
|
|
125
|
+
| batch_size | integer | The batch size for the data loader.|
|
|
126
|
+
| threshold | float | The threshold for filtering whether two genes are paired, which helps in identifying confidently paired genes at subcellular resolution. This option is applicable only in "single" input mode and is not functional in "pair" mode.|
|
|
127
|
+
| left_cell | array_like | A list of cell IDs representing the query cells.|
|
|
128
|
+
| right_cell | array_like | A list of cell IDs representing the key cells. |
|
|
129
|
+
| num_workers | integer | The number of CPU cores to load the data. This value should match the number of workers specified in the data loader.|
|
|
130
|
+
|
|
131
|
+
### Star Trend
|
|
132
|
+
|
|
133
|
+
[](https://star-history.com/#TerminatorJ/Spatialformer&Date)
|
|
134
|
+
|
|
135
|
+
|
|
136
|
+
|
|
137
|
+
## Cite our work
|
|
138
|
+
Wang J, Huang Y, Winther O. SpatialFormer: Universal Spatial Representation Learning from Subcellular Molecular to Multicellular Landscapes[J]. bioRxiv, 2025: 2025.01. 18.633701.
|
|
139
|
+
|
|
140
|
+
|
|
@@ -0,0 +1,22 @@
|
|
|
1
|
+
from setuptools import setup, find_packages
|
|
2
|
+
|
|
3
|
+
|
|
4
|
+
with open('./requirements.txt', 'r') as f:
|
|
5
|
+
requirements = f.read().splitlines()
|
|
6
|
+
|
|
7
|
+
setup(
|
|
8
|
+
name='spatialformer',
|
|
9
|
+
version='0.0.6',
|
|
10
|
+
author='TerminatorJ',
|
|
11
|
+
author_email='wangjun19950708@gmail.com',
|
|
12
|
+
description='A single-cell foundation model focus on the spatial cell-cell colocalization',
|
|
13
|
+
url='https://github.com/TerminatorJ/Spatialformer/',
|
|
14
|
+
classifiers=[
|
|
15
|
+
'Programming Language :: Python :: 3',
|
|
16
|
+
'License :: OSI Approved :: MIT License',
|
|
17
|
+
],
|
|
18
|
+
python_requires='>=3.9',
|
|
19
|
+
install_requires=requirements,
|
|
20
|
+
packages=find_packages(include=['spatialformer', 'spatialformer.tools']),
|
|
21
|
+
|
|
22
|
+
)
|