spatialformer 0.0.6__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -0,0 +1,21 @@
1
+ MIT License
2
+
3
+ Copyright (c) 2025 Jun Wang
4
+
5
+ Permission is hereby granted, free of charge, to any person obtaining a copy
6
+ of this software and associated documentation files (the "Software"), to deal
7
+ in the Software without restriction, including without limitation the rights
8
+ to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9
+ copies of the Software, and to permit persons to whom the Software is
10
+ furnished to do so, subject to the following conditions:
11
+
12
+ The above copyright notice and this permission notice shall be included in all
13
+ copies or substantial portions of the Software.
14
+
15
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17
+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18
+ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19
+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20
+ OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21
+ SOFTWARE.
@@ -0,0 +1,11 @@
1
+ Metadata-Version: 2.1
2
+ Name: spatialformer
3
+ Version: 0.0.6
4
+ Summary: A single-cell foundation model focus on the spatial cell-cell colocalization
5
+ Home-page: https://github.com/TerminatorJ/Spatialformer/
6
+ Author: TerminatorJ
7
+ Author-email: wangjun19950708@gmail.com
8
+ Classifier: Programming Language :: Python :: 3
9
+ Classifier: License :: OSI Approved :: MIT License
10
+ Requires-Python: >=3.9
11
+ License-File: LICENSE
@@ -0,0 +1,140 @@
1
+ ---
2
+
3
+ This is the official codebase for the SpatialFormer, the first single cell spatial foundation model to learn the universal representation (subcellular molecular & cellular spatial proximity) by multi-tasks learning.
4
+
5
+ [![License](https://img.shields.io/badge/license-MIT-blue)](https://github.com/username/repo/blob/main/LICENSE)
6
+
7
+ ![SpatialFormer](./rm_figs/github_main_figure.png)
8
+
9
+ ## Overview
10
+ Spatial transcriptomics quantifies gene expression within its spatial context, significantly advancing biomedical research. Understanding gene spatial expression and the organization of multicellular systems is vital for disease diagnosis and studying biological processes. However, existing models often struggle to integrate gene expression data with cellular spatial information effectively. In this study, we introduce SpatialFormer, a hybrid framework combining convolutional networks and transformers to learn single-cell multimodal and multi-scale information in the niche context, including expression data and subcellular gene spatial distribution. Pre-trained on 300 million cell pairs from 12 million spatially resolved single cells across 62 Xenium slides, SpatialFormer merges gene spatial expression profiles with cell niche information via the pair-wise training strategy. Our findings demonstrate that SpatialFormer distills biological signals across various tasks, including single-cell batch correction, cell-type annotation, co-localization detection, and identifying gene pairs that are critical for the immune cell-cell interactions involved in the regulation of lung fibrosis. These advancements enhance our understanding of cellular dynamics and offer new pathways for applications in biomedical research.
11
+
12
+
13
+ ## System Requirements
14
+ ### Hardware requirements
15
+ We provide the GPU and CPU version for users with different device levels. However, if a large scale of cells need to be calculated, the GPUs is mandatory to get the results effeciently. When using GPUs, AMD and IVIDIA GPUs are all supported.
16
+ ### Software requirements
17
+ #### OS requirements
18
+ This package is supported for macOS and Linux. The package has been tested on the following systems:
19
+ - macOS: Sequoia (15.3.1)
20
+ - Linux: Ubuntu 16.04; SLES 15.5
21
+
22
+ #### Python environment requirements
23
+ Create the spatialformer environment by anaconda (python >= 3.9 required)
24
+ ```bash
25
+ conda create -n spatialformer python=3.9
26
+ ```
27
+
28
+ ## Installation
29
+
30
+ ### Install from PyPi
31
+ If you are using the AMD gpus
32
+ ```bash
33
+ pip install SpatialFormer --extra-index-url https://download.pytorch.org/whl/rocm6.0
34
+ ```
35
+ Alternatively, if you are using the NVDIA gpus
36
+ ```bash
37
+ pip install SpatialFormer --extra-index-url https://download.pytorch.org/whl/cu121
38
+ ```
39
+
40
+ ### Install from Github
41
+ ```bash
42
+ git clone https://github.com/TerminatorJ/Spatialformer/
43
+ cd Spatialformer
44
+ ```
45
+ if you are using the AMD gpus
46
+ ```bash
47
+ pip install -e --extra-index-url https://download.pytorch.org/whl/rocm6.0
48
+ ```
49
+ whereas, if you are using the NVIDIA gpus
50
+ ```bash
51
+ pip install -e --extra-index-url https://download.pytorch.org/whl/cu121
52
+ ```
53
+
54
+
55
+
56
+
57
+ ## Pretraining data
58
+
59
+ The model is capable of handling input from individual cells and doublets. It was originally pretrained on a large-scale dataset of pairwise doublets with both positive and negative characteristics. Specifically, the positive pairs consist of all cells located within the niches of a certain query cell. In contrast, the negative pairs can include any distant cells that are either far away from the query cell or just one hop away from their corresponding positive pairs. Notably, the positive and negative pairs are maintained at a 1:1 ratio to ensure that each training batch contains 50% positive and 50% negative examples.
60
+
61
+ The processed individual cell dataset can be retrieved from the Hugging Face dataset repository at [SpatialCC-12M](https://huggingface.co/datasets/TerminatorJ/xenium_pandavid_dataset). The pairwise data can be generated by following the instructions provided in `/data_preprocess/`.
62
+
63
+ You can easily download the dataset in python as below
64
+ ```python
65
+ from datasets import load_dataset
66
+ spatialcc = load_dataset("TerminatorJ/xenium_pandavid_dataset4", cache_dir = "your_cache_dir")
67
+ ```
68
+ Alternatively, the original pretraining dataset can be downloaded from figshare (DOI https://doi.org/10.6084/m9.figshare.28436606.v1) [SpatialCC-12M](https://figshare.com/articles/dataset/SpatialCC-12M/28436606?file=52449404).
69
+
70
+
71
+
72
+ ## Get the Embeddings
73
+
74
+ SpatialFormer provides a simple function to extract embeddings. By using the `sp.tl.embed()` function, we can seamlessly integrate with the AnnData object, meaning the generated embeddings will be stored in `obsm` under the key `"X_SpaF"`.
75
+
76
+ SpatialFormer supports two methods for generating embeddings: 1) single input mode and 2) pairwise input mode. Below is an example of generating the AnnData embeddings:
77
+
78
+ A simple example anndata can be downloaded [here](downstream/cell_cell_communication/data/covid_subsampled.h5ad)
79
+
80
+
81
+ #### Loading the anndata
82
+
83
+ ```python
84
+ import scanpy as sc
85
+ adata = sc.read_h5ad("./downstream/cell_cell_communication/data/covid_subsampled.h5ad")
86
+ ```
87
+
88
+
89
+ #### Single Input Mode
90
+ ```python
91
+ import SpatialFormer as sp
92
+ embed_adata = sp.tl.embed_data(adata,
93
+ tissue,
94
+ condition,
95
+ method,
96
+ model_ckp_path,
97
+ batch_size,
98
+ mode = "single",
99
+ threshold = 0.7,
100
+ num_workers = 8
101
+ )
102
+ ```
103
+ #### Pairwise Input Mode
104
+ ```python
105
+ embed_adata = sp.tl.embed_data(adata,
106
+ tissue,
107
+ condition,
108
+ method,
109
+ model_ckp_path,
110
+ batch_size,
111
+ mode = "pair",
112
+ left_cell = ["aacid_0789", "aacid_0799"],
113
+ right_cell = ["aacid_0635", "aacid_0652"],
114
+ num_workers = 8
115
+ )
116
+ ```
117
+
118
+ | Arguments | dtype |Description |
119
+ | :------------------------ | :--------- | :--------- |
120
+ | adata | object | An AnnData object that stores expression information by CellXGene.|
121
+ | tissue | string | The type of tissue (e.g., Breast/Lung).|
122
+ | condition | string | Metadata for the sample condition (e.g., Disease/Healthy). |
123
+ | method | string | The method of the embed function, which can be either "single" or "pair." The single mode collates only individual cells as input for the model. In "pair" mode, data is prepared for pairwise input. If using "pair," both left_cell and right_cell must be provided, and their lengths must be the same. Each cell ID in left_cell corresponds to the cell ID at the same index in right_cell. |
124
+ | model_ckp_path | string | The path to the SpatialFormer model checkpoint.|
125
+ | batch_size | integer | The batch size for the data loader.|
126
+ | threshold | float | The threshold for filtering whether two genes are paired, which helps in identifying confidently paired genes at subcellular resolution. This option is applicable only in "single" input mode and is not functional in "pair" mode.|
127
+ | left_cell | array_like | A list of cell IDs representing the query cells.|
128
+ | right_cell | array_like | A list of cell IDs representing the key cells. |
129
+ | num_workers | integer | The number of CPU cores to load the data. This value should match the number of workers specified in the data loader.|
130
+
131
+ ### Star Trend
132
+
133
+ [![Star History Chart](https://api.star-history.com/svg?repos=TerminatorJ/Spatialformer&type=Date)](https://star-history.com/#TerminatorJ/Spatialformer&Date)
134
+
135
+
136
+
137
+ ## Cite our work
138
+ Wang J, Huang Y, Winther O. SpatialFormer: Universal Spatial Representation Learning from Subcellular Molecular to Multicellular Landscapes[J]. bioRxiv, 2025: 2025.01. 18.633701.
139
+
140
+
@@ -0,0 +1,4 @@
1
+ [egg_info]
2
+ tag_build =
3
+ tag_date = 0
4
+
@@ -0,0 +1,22 @@
1
+ from setuptools import setup, find_packages
2
+
3
+
4
+ with open('./requirements.txt', 'r') as f:
5
+ requirements = f.read().splitlines()
6
+
7
+ setup(
8
+ name='spatialformer',
9
+ version='0.0.6',
10
+ author='TerminatorJ',
11
+ author_email='wangjun19950708@gmail.com',
12
+ description='A single-cell foundation model focus on the spatial cell-cell colocalization',
13
+ url='https://github.com/TerminatorJ/Spatialformer/',
14
+ classifiers=[
15
+ 'Programming Language :: Python :: 3',
16
+ 'License :: OSI Approved :: MIT License',
17
+ ],
18
+ python_requires='>=3.9',
19
+ install_requires=requirements,
20
+ packages=find_packages(include=['spatialformer', 'spatialformer.tools']),
21
+
22
+ )