ourotools 0.2.0__tar.gz
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- ourotools-0.2.0/LICENSE +21 -0
- ourotools-0.2.0/MANIFEST.in +1 -0
- ourotools-0.2.0/PKG-INFO +545 -0
- ourotools-0.2.0/README.md +520 -0
- ourotools-0.2.0/ourotools/__init__.py +10 -0
- ourotools-0.2.0/ourotools/__main__.py +10 -0
- ourotools-0.2.0/ourotools/core/BA.py +225 -0
- ourotools-0.2.0/ourotools/core/MAP.py +38 -0
- ourotools-0.2.0/ourotools/core/ONT.py +174 -0
- ourotools-0.2.0/ourotools/core/OT.py +125 -0
- ourotools-0.2.0/ourotools/core/SAM.py +588 -0
- ourotools-0.2.0/ourotools/core/SC.py +647 -0
- ourotools-0.2.0/ourotools/core/SEQ.py +99 -0
- ourotools-0.2.0/ourotools/core/STR.py +398 -0
- ourotools-0.2.0/ourotools/core/__init__.py +3 -0
- ourotools-0.2.0/ourotools/core/alternative_splicing_analysis.py +903 -0
- ourotools-0.2.0/ourotools/core/biobookshelf.py +2538 -0
- ourotools-0.2.0/ourotools/core/core.py +17252 -0
- ourotools-0.2.0/ourotools.egg-info/PKG-INFO +545 -0
- ourotools-0.2.0/ourotools.egg-info/SOURCES.txt +24 -0
- ourotools-0.2.0/ourotools.egg-info/dependency_links.txt +1 -0
- ourotools-0.2.0/ourotools.egg-info/entry_points.txt +2 -0
- ourotools-0.2.0/ourotools.egg-info/requires.txt +13 -0
- ourotools-0.2.0/ourotools.egg-info/top_level.txt +1 -0
- ourotools-0.2.0/setup.cfg +4 -0
- ourotools-0.2.0/setup.py +46 -0
ourotools-0.2.0/LICENSE
ADDED
|
@@ -0,0 +1,21 @@
|
|
|
1
|
+
MIT License
|
|
2
|
+
|
|
3
|
+
Copyright (c) 2022 ahs2202
|
|
4
|
+
|
|
5
|
+
Permission is hereby granted, free of charge, to any person obtaining a copy
|
|
6
|
+
of this software and associated documentation files (the "Software"), to deal
|
|
7
|
+
in the Software without restriction, including without limitation the rights
|
|
8
|
+
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
|
|
9
|
+
copies of the Software, and to permit persons to whom the Software is
|
|
10
|
+
furnished to do so, subject to the following conditions:
|
|
11
|
+
|
|
12
|
+
The above copyright notice and this permission notice shall be included in all
|
|
13
|
+
copies or substantial portions of the Software.
|
|
14
|
+
|
|
15
|
+
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
|
|
16
|
+
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
|
|
17
|
+
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
|
|
18
|
+
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
|
|
19
|
+
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
|
|
20
|
+
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
|
|
21
|
+
SOFTWARE.
|
|
@@ -0,0 +1 @@
|
|
|
1
|
+
|
ourotools-0.2.0/PKG-INFO
ADDED
|
@@ -0,0 +1,545 @@
|
|
|
1
|
+
Metadata-Version: 2.1
|
|
2
|
+
Name: ourotools
|
|
3
|
+
Version: 0.2.0
|
|
4
|
+
Summary: A comprehensive toolkit for quality control and analysis of single-cell long-read RNA-seq data
|
|
5
|
+
Home-page: https://github.com/ahs2202/ouro-tools
|
|
6
|
+
Author: Hyunsu An
|
|
7
|
+
Author-email: ahs2202@gm.gist.ac.kr
|
|
8
|
+
License: GPLv3
|
|
9
|
+
Requires-Python: >=3.8, <4
|
|
10
|
+
Description-Content-Type: text/markdown
|
|
11
|
+
License-File: LICENSE
|
|
12
|
+
Requires-Dist: pysam>=0.18.0
|
|
13
|
+
Requires-Dist: bitarray>=2.5.1
|
|
14
|
+
Requires-Dist: scipy>=1.9.1
|
|
15
|
+
Requires-Dist: tqdm>=4.64.1
|
|
16
|
+
Requires-Dist: nest-asyncio>=1.5.6
|
|
17
|
+
Requires-Dist: joblib>=1.2.0
|
|
18
|
+
Requires-Dist: pandas>=1.5.2
|
|
19
|
+
Requires-Dist: intervaltree>=3.1.0
|
|
20
|
+
Requires-Dist: matplotlib>=3.5.2
|
|
21
|
+
Requires-Dist: mappy>=2.24
|
|
22
|
+
Requires-Dist: h5py>=3.8.0
|
|
23
|
+
Requires-Dist: pyBigWig>=0.3.22
|
|
24
|
+
Requires-Dist: plotly>=5.18.0
|
|
25
|
+
|
|
26
|
+
<h1 align="center">
|
|
27
|
+
<a href="https://github.com/ahs2202/ouro-tools"><img src="doc/img/ourotools-logo-css.svg" width="850" height="189"></a>
|
|
28
|
+
<br><br>
|
|
29
|
+
<a href="https://github.com/ahs2202/ouro-tools">Ouro-Tools</a> - <em>long-read scRNA-seq</em> toolkit
|
|
30
|
+
</h1>
|
|
31
|
+
|
|
32
|
+
Ouro-Tools is a novel, comprehensive computational pipeline for long-read scRNA-seq with the following key features. Ouro-Tools **(1) normalizes mRNA size distributions** and **(2) detects mRNA 7-methylguanosine caps** to integrate multiple single-cell long-read RNA-sequencing experiments across modalities and characterize full-length transcripts, respectively.
|
|
33
|
+
|
|
34
|
+
<p align="center">
|
|
35
|
+
<a href="https://github.com/ahs2202/ouro-tools"><img src="doc/img/ourotools-intro.SVG" width="850" height="412"></a>
|
|
36
|
+
</p>
|
|
37
|
+
|
|
38
|
+
|
|
39
|
+
|
|
40
|
+
## Table of Contents
|
|
41
|
+
|
|
42
|
+
- [Table of Contents](#table-of-contents)
|
|
43
|
+
|
|
44
|
+
- [Introduction](#introduction)
|
|
45
|
+
|
|
46
|
+
- [What is long-read scRNA-seq?](#what-is-long-read-scRNA-seq)
|
|
47
|
+
|
|
48
|
+
- [Installation](#installation)
|
|
49
|
+
|
|
50
|
+
- [Before starting the tutorial](#before-start)
|
|
51
|
+
|
|
52
|
+
- [Download our *toy* long-read scRNA-seq datasets](#toy-datasets)
|
|
53
|
+
- [Basic settings for running the entire pipeline](#basic-settings)
|
|
54
|
+
|
|
55
|
+
- [*step 1)* Raw long-read pre-processing module](#preprocessing)
|
|
56
|
+
|
|
57
|
+
- [*step 2)* Spliced alignment](#alignment)
|
|
58
|
+
|
|
59
|
+
- [*step 3)* Barcode extraction module](#barcode-extraction)
|
|
60
|
+
|
|
61
|
+
- [*step 4)* Biological full-length molecule identification module](#full-length-ID)
|
|
62
|
+
|
|
63
|
+
- [*step 5)* Size distribution normalization module](#size-normalization)
|
|
64
|
+
|
|
65
|
+
- [*step 6)* Single-cell count module](#single-cell-count-module)
|
|
66
|
+
|
|
67
|
+
- [*step 7)* Visualization](#visualization)
|
|
68
|
+
|
|
69
|
+
- [*wrap-up)* Running the entire pipeline using a wrapper function](#run-entire-pipeline)
|
|
70
|
+
|
|
71
|
+
- [Pre-built indices of unwanted genomic sequences for pre-processing](#pre-built-unwanted-genomic-sequences)
|
|
72
|
+
|
|
73
|
+
- [An Ouro-Tools count module index](#count-module-index)
|
|
74
|
+
|
|
75
|
+
- [Pre-built count module index](#pre-built-index)
|
|
76
|
+
- [Building index from scratch](#building-index)
|
|
77
|
+
- [*optional input annotations*](#optional-input-annotations)
|
|
78
|
+
|
|
79
|
+
- [SAM Tags](#SAM-tags)
|
|
80
|
+
|
|
81
|
+
- [Bitwise flags](bitwise-flags)
|
|
82
|
+
|
|
83
|
+
|
|
84
|
+
|
|
85
|
+
## Introduction <a name="introduction"></a>
|
|
86
|
+
|
|
87
|
+
The Ouro-Tools pipeline comprises five main modules, allowing seamless integration with existing bulk and single-cell long-read RNA-seq pipelines and tools. Every main module of Ouro-Tools utilizes efficient parallelization for compute-intensive tasks to facilitate the processing of large datasets. Additionally, each Ouro-Tools module employs filesystem-based locks for parallel processing of a large number of samples across multiple machines for scalability.
|
|
88
|
+
|
|
89
|
+
|
|
90
|
+
|
|
91
|
+
### What is long-read scRNA-seq? <a name="what-is-long-read-scRNA-seq"></a>
|
|
92
|
+
|
|
93
|
+

|
|
94
|
+
|
|
95
|
+
(Figure adapted from Volden & Vollmers, Genome Biol. 23:47 (2022), and made available under [Creative Commons license 4.0](https://creativecommons.org/licenses/by/4.0/) by Oxford Nanopore Technologies plc.)
|
|
96
|
+
|
|
97
|
+
In 2013, 2019, and 2022, “single-cell sequencing,” “single-cell multimodal omics,” and “long-read sequencing” were chosen as “Method of the Year” by *Nature Methods* journal, respectively, highlighting the urgent need to understand biology at the resolution of individual cells and individual biological molecules. Long-read scRNA-seq is a method that combines the single-cell RNA sequencing and long-read sequencing ([Nanopore](https://nanoporetech.com/applications/investigations/single-cell-sequencing) and [PacBio](https://www.pacb.com/products-and-services/applications/rna-sequencing/single-cell-rna-sequencing/)) methods.
|
|
98
|
+
|
|
99
|
+
|
|
100
|
+
|
|
101
|
+
## Installation <a name="installation"></a>
|
|
102
|
+
|
|
103
|
+
The latest stable version of Ouro-Tools is available via https://pypi.org/
|
|
104
|
+
|
|
105
|
+
In order to install the latest, unreleased version of Ouro-Tools, run the following commands in bash shell.
|
|
106
|
+
|
|
107
|
+
```bash
|
|
108
|
+
git clone https://github.com/ahs2202/ouro-tools.git
|
|
109
|
+
cd ouro-tools
|
|
110
|
+
pip install .
|
|
111
|
+
```
|
|
112
|
+
|
|
113
|
+
|
|
114
|
+
|
|
115
|
+
Ouro-Tools can be used in command line, in a Python script, or in an interactive Python interpreter (e.g., Jupyter Notebook).
|
|
116
|
+
|
|
117
|
+
To print the command line usage example of each module from the bash shell, please type the following command.
|
|
118
|
+
|
|
119
|
+
|
|
120
|
+
|
|
121
|
+
**Bash shell**
|
|
122
|
+
|
|
123
|
+
```bash
|
|
124
|
+
ourotools LongFilterNSplit -h
|
|
125
|
+
```
|
|
126
|
+
|
|
127
|
+
**IPython environment (Jupyter notebook)**
|
|
128
|
+
|
|
129
|
+
```python
|
|
130
|
+
ourotools.LongFilterNSplit?
|
|
131
|
+
```
|
|
132
|
+
|
|
133
|
+
|
|
134
|
+
|
|
135
|
+
## Before starting the tutorial<a name="before-start"></a>
|
|
136
|
+
|
|
137
|
+
|
|
138
|
+
|
|
139
|
+
### Download our *toy* long-read scRNA-seq datasets <a name="toy-datasets"></a>
|
|
140
|
+
|
|
141
|
+
```bash
|
|
142
|
+
# download toy datasets from mouse ovary and testis
|
|
143
|
+
wget https://ouro-tools.s3.amazonaws.com/tutorial/mOvary.subsampled.fastq.gz
|
|
144
|
+
wget https://ouro-tools.s3.amazonaws.com/tutorial/mTestis2.subsampled.fastq.gz
|
|
145
|
+
```
|
|
146
|
+
|
|
147
|
+
Alternatively, you can download directly using your browser using the following links: [mOvary](https://ouro-tools.s3.amazonaws.com/tutorial/mOvary.subsampled.fastq.gz) and [mTestis](https://ouro-tools.s3.amazonaws.com/tutorial/mTestis2.subsampled.fastq.gz )
|
|
148
|
+
|
|
149
|
+
|
|
150
|
+
|
|
151
|
+
### Basic settings for running the entire pipeline<a name="basic-settings"></a>
|
|
152
|
+
|
|
153
|
+
```python
|
|
154
|
+
import ourotools
|
|
155
|
+
|
|
156
|
+
|
|
157
|
+
# global multiprocessing settings
|
|
158
|
+
ourotools.bk.int_max_num_batches_in_a_queue_for_each_worker = 1
|
|
159
|
+
n_workers = 2 # employ 2 workers (since there are two samples, 2 workers are sufficient)
|
|
160
|
+
n_threads_for_each_worker = 8 # use 8 CPU cores for each worker
|
|
161
|
+
|
|
162
|
+
|
|
163
|
+
# datasets-specific setting
|
|
164
|
+
path_folder_data = '/home/project/Single_Cell_Full_Length_Atlas/data/pipeline/20220331_Ouroboros_Project/pipeline/20230208_Mouse_Long_Read_Single_Cell_Atlas/pipeline/20230811_mouse_long_read_single_cell_atlas_v202308/tutorial_data/20240728_ovary_testis_tutorial/'
|
|
165
|
+
l_name_sample = [
|
|
166
|
+
'mOvary.subsampled',
|
|
167
|
+
'mTestis2.subsampled',
|
|
168
|
+
]
|
|
169
|
+
|
|
170
|
+
|
|
171
|
+
# scRNA-seq technology-specific settings
|
|
172
|
+
path_file_valid_barcode_list = '/home/project/Single_Cell_Full_Length_Atlas/data/pipeline/20210728_development_ouroboros_qc/example/3M-february-2018.txt.gz' # GEX v3 CB
|
|
173
|
+
|
|
174
|
+
|
|
175
|
+
# species-specific settings
|
|
176
|
+
path_file_minimap_index_genome = '/home/shared/ensembl/Mus_musculus/index/minimap2/Mus_musculus.GRCm38.dna.primary_assembly.k_14.idx'
|
|
177
|
+
path_file_minimap_splice_junction = '/home/shared/ensembl/Mus_musculus/Mus_musculus.GRCm38.102.paftools.bed'
|
|
178
|
+
path_file_minimap_unwanted = '/home/project/Single_Cell_Full_Length_Atlas/data/accessory_data/cDNA_depletion/index/minimap2/MT_and_rRNA_GRCm38.fa.ont.mmi'
|
|
179
|
+
path_folder_count_module_index = '/home/project/Single_Cell_Full_Length_Atlas/data/pipeline/20211116_ouroboros_short_read_public_data_mining/scarab_annotations/Mus_musculus.GRCm38.102.v0.2.4/' # path to the Ouro-Tools count module index
|
|
180
|
+
```
|
|
181
|
+
|
|
182
|
+
To find the barcode whitelist specific to your scRNA-seq experiment, please refer to [the official 10x Genomics article](https://kb.10xgenomics.com/hc/en-us/articles/115004506263-What-is-a-barcode-whitelist). Pre-built Ouro-Tools count module index can be downloaded [here](#pre-built-index). Pre-built indices of unwanted sequences (ribosomal DNA repeats and mitochondrial DNAs) can be downloaded [here](#pre-built-unwanted-genomic-sequences).
|
|
183
|
+
|
|
184
|
+
|
|
185
|
+
|
|
186
|
+
## *step 1)* Raw long-read pre-processing module (QC module)<a name="preprocessing"></a>
|
|
187
|
+
|
|
188
|
+
```python
|
|
189
|
+
# run LongFilterNSplit
|
|
190
|
+
ourotools.LongFilterNSplit(
|
|
191
|
+
path_file_minimap_index_genome = path_file_minimap_index_genome,
|
|
192
|
+
l_path_file_minimap_index_unwanted = [ path_file_minimap_unwanted ],
|
|
193
|
+
l_path_file_fastq_input = list( f"{path_folder_data}{name_sample}.fastq.gz" for name_sample in l_name_sample ),
|
|
194
|
+
l_path_folder_output = list( f"{path_folder_data}LongFilterNSplit_out/{name_sample}/" for name_sample in l_name_sample ),
|
|
195
|
+
int_num_samples_analyzed_concurrently = n_workers,
|
|
196
|
+
n_threads = n_workers * n_threads_for_each_worker,
|
|
197
|
+
)
|
|
198
|
+
```
|
|
199
|
+
|
|
200
|
+
As the first module of the Ouro-Tools pipeline, the raw long-read pre-processing module `LongFilterNSplit` has a dual function for (1) providing comprehensive quality control metrics of a long-read scRNA-seq experiment and (2) pre-processing of raw long-read sequencing data for the downstream analysis.
|
|
201
|
+
|
|
202
|
+
<p align="center">
|
|
203
|
+
<img src="doc/img/QC-example.svg" width="850" height="412">
|
|
204
|
+
</p>
|
|
205
|
+
|
|
206
|
+
According to the classification results, cDNA molecules are organized into separate output FASTQ files. For the cDNA molecules that contains a single (external or internal) poly(A) tail, the read is re-oriented so that it has the same orientation as its original mRNA transcript, with the poly(A) tail at its 3’ end; the resulting long-reads of cDNAs can be utilized for strand-specific long-read RNA-seq analysis.
|
|
207
|
+
|
|
208
|
+
|
|
209
|
+
|
|
210
|
+
## *step 2)* Spliced alignment <a name="alignment"></a>
|
|
211
|
+
|
|
212
|
+
```python
|
|
213
|
+
# align using minimap2 (require that minimap2 executable can be found in PATH)
|
|
214
|
+
# below is a wrapper function for minimap2
|
|
215
|
+
ourotools.Workers(
|
|
216
|
+
ourotools.ONT.Minimap2_Align, # function to deploy
|
|
217
|
+
int_num_workers_for_Workers = n_workers, # create 'n_workers' number of workers
|
|
218
|
+
# below are arguments for the function 'ourotools.ONT.Minimap2_Align'
|
|
219
|
+
path_file_fastq = list( f"{path_folder_data}LongFilterNSplit_out/{name_sample}/aligned_to_genome__non_chimeric__poly_A__plus_strand.fastq.gz" for name_sample in l_name_sample ),
|
|
220
|
+
path_folder_minimap2_output = list( f"{path_folder_data}minimap2_bam_genome/{name_sample}/" for name_sample in l_name_sample ),
|
|
221
|
+
path_file_junc_bed = path_file_minimap_splice_junction,
|
|
222
|
+
path_file_minimap2_index = path_file_minimap_index_genome,
|
|
223
|
+
n_threads = n_threads_for_each_worker,
|
|
224
|
+
)
|
|
225
|
+
```
|
|
226
|
+
|
|
227
|
+
*Minimap2* can be used for annotation-guided alignment based on the transcript annotations prepared by the researcher. Here, the reference annotation from Ensembl (*Ensembl release 102*) was utilized.
|
|
228
|
+
|
|
229
|
+
|
|
230
|
+
|
|
231
|
+
## *step 3)* Barcode extraction module <a name="barcode-extraction"></a>
|
|
232
|
+
|
|
233
|
+
```python
|
|
234
|
+
# run LongExtractBarcodeFromBAM
|
|
235
|
+
l_path_folder_barcodedbam = list( f"{path_folder_data}LongExtractBarcodeFromBAM_out/{name_sample}/" for name_sample in l_name_sample )
|
|
236
|
+
ourotools.LongExtractBarcodeFromBAM(
|
|
237
|
+
path_file_valid_cb = path_file_valid_barcode_list,
|
|
238
|
+
l_path_file_bam_input = list( f"{path_folder_data}minimap2_bam_genome/{name_sample}/aligned_to_genome__non_chimeric__poly_A__plus_strand.fastq.gz.minimap2_aligned.bam" for name_sample in l_name_sample ),
|
|
239
|
+
l_path_folder_output = l_path_folder_barcodedbam,
|
|
240
|
+
int_num_samples_analyzed_concurrently = n_workers,
|
|
241
|
+
n_threads = n_workers * n_threads_for_each_worker,
|
|
242
|
+
)
|
|
243
|
+
```
|
|
244
|
+
|
|
245
|
+
The barcode extraction module `LongExtractBarcodeFromBAM` identifies cell barcode (**CB**) and unique molecular identifier (**UMI**) sequences for each read and exports the results as a **“barcoded” BAM file**, a BAM file containing corrected CB and UMI sequences for each read using [the predefined SAM tags](#SAM-tags).
|
|
246
|
+
|
|
247
|
+
<p align="center">
|
|
248
|
+
<img src="doc/img/UMI-deduplication-example.svg" width="850" height="412">
|
|
249
|
+
</p>
|
|
250
|
+
|
|
251
|
+
|
|
252
|
+
|
|
253
|
+
## *step 4)* Biological full-length molecule identification module <a name="full-length-ID"></a>
|
|
254
|
+
|
|
255
|
+
```python
|
|
256
|
+
# run full-length ID module
|
|
257
|
+
# survey 5' sites for each sample
|
|
258
|
+
ourotools.LongSurvey5pSiteFromBAM(
|
|
259
|
+
l_path_folder_input = l_path_folder_barcodedbam,
|
|
260
|
+
int_num_samples_analyzed_concurrently = n_workers,
|
|
261
|
+
n_threads = n_workers * n_threads_for_each_worker,
|
|
262
|
+
)
|
|
263
|
+
# combine 5' site profiles across samples and classify each 5' profile
|
|
264
|
+
ourotools.LongClassify5pSiteProfiles(
|
|
265
|
+
l_path_folder_input = l_path_folder_barcodedbam,
|
|
266
|
+
path_folder_output = f"{path_folder_data}LongClassify5pSiteProfiles_out/",
|
|
267
|
+
n_threads = n_threads_for_each_worker,
|
|
268
|
+
)
|
|
269
|
+
# append 5' site classification results to each BAM file
|
|
270
|
+
ourotools.LongAdd5pSiteClassificationResultToBAM(
|
|
271
|
+
path_folder_input_5p_sites = f'{path_folder_data}LongClassify5pSiteProfiles_out/',
|
|
272
|
+
l_path_folder_input_barcodedbam = l_path_folder_barcodedbam,
|
|
273
|
+
int_num_samples_analyzed_concurrently = n_workers,
|
|
274
|
+
n_threads = n_workers * n_threads_for_each_worker,
|
|
275
|
+
)
|
|
276
|
+
# filter artifact reads from each BAM file
|
|
277
|
+
ourotools.Workers(
|
|
278
|
+
ourotools.FilterArtifactReadFromBAM, # function to deploy
|
|
279
|
+
int_num_workers_for_Workers = n_workers, # create 'n_workers' number of workers
|
|
280
|
+
# below are arguments for the function 'ourotools.FilterArtifactReadFromBAM'
|
|
281
|
+
path_file_bam_input = list( f'{path_folder_data}LongExtractBarcodeFromBAM_out/{name_sample}/5pSiteTagAdded/barcoded.bam' for name_sample in l_name_sample ),
|
|
282
|
+
path_folder_output = list( f'{path_folder_data}LongExtractBarcodeFromBAM_out/{name_sample}/5pSiteTagAdded/FilterArtifactReadFromBAM_out/' for name_sample in l_name_sample ),
|
|
283
|
+
)
|
|
284
|
+
```
|
|
285
|
+
|
|
286
|
+
The biological full-length identification module collects the lengths of guanosine homopolymers at the 5’ ends of cDNAs to identify genuine TSSs that produce capped mRNAs, depleting truncated cDNA molecules *in silico*. The module is implemented as a workflow consisting of `LongSurvey5pSiteFromBAM`, `LongClassify5pSiteProfiles`, `LongAdd5pSiteClassificationResultToBAM`, and `FilterArtifactReadFromBAM`.
|
|
287
|
+
|
|
288
|
+
<p align="center">
|
|
289
|
+
<img src="doc/img/full-length-identification-example.svg" width="600" height="412">
|
|
290
|
+
</p>
|
|
291
|
+
|
|
292
|
+
|
|
293
|
+
|
|
294
|
+
## *step 5)* Size distribution normalization module <a name="size-normalization"></a>
|
|
295
|
+
|
|
296
|
+
```python
|
|
297
|
+
# run mRNA size distribution normalization module
|
|
298
|
+
# survey the size distribution of full-length mRNAs for each sample
|
|
299
|
+
l_full_length_bam = list( f'{path_folder_data}LongExtractBarcodeFromBAM_out/{name_sample}/5pSiteTagAdded/FilterArtifactReadFromBAM_out/valid_3p_valid_5p.bam' for name_sample in l_name_sample )
|
|
300
|
+
ourotools.Workers(
|
|
301
|
+
ourotools.LongSummarizeSizeDistributions,
|
|
302
|
+
int_num_workers_for_Workers = n_workers, # create 'n_workers' number of workers
|
|
303
|
+
path_file_bam_input = l_full_length_bam,
|
|
304
|
+
path_folder_output = list( f'{path_folder_data}LongExtractBarcodeFromBAM_out/{name_sample}/5pSiteTagAdded/FilterArtifactReadFromBAM_out/valid_3p_valid_5p.LongSummarizeSizeDistributions_out/' for name_sample in l_name_sample ),
|
|
305
|
+
)
|
|
306
|
+
# normalize size distributions
|
|
307
|
+
path_folder_size_norm = f"{path_folder_data}LongCreateReferenceSizeDistribution_out/"
|
|
308
|
+
ourotools.LongCreateReferenceSizeDistribution(
|
|
309
|
+
l_path_file_distributions = list( f'{path_folder_data}LongExtractBarcodeFromBAM_out/{name_sample}/5pSiteTagAdded/FilterArtifactReadFromBAM_out/valid_3p_valid_5p.LongSummarizeSizeDistributions_out/dict_arr_dist.pkl' for name_sample in l_name_sample ),
|
|
310
|
+
l_name_file_distributions = l_name_sample,
|
|
311
|
+
path_folder_output = path_folder_size_norm,
|
|
312
|
+
float_max_ratio_to_arr_dist_guassian_filter_min_sigma_for_dynamic_gaussian_filter_selection = 2,
|
|
313
|
+
float_sigma_gaussian_filter_min = 8,
|
|
314
|
+
int_min_total_read_count_for_a_peak = 30 ,
|
|
315
|
+
)
|
|
316
|
+
# based on the output, set the confident size range
|
|
317
|
+
str_confident_size_range = ourotools.get_confident_size_range( path_folder_size_norm )
|
|
318
|
+
```
|
|
319
|
+
|
|
320
|
+
The size distribution normalization module is implemented using the `LongSummarizeSizeDistributions` and `LongCreateReferenceSizeDistribution` workflows. First, using the `LongSummarizeSizeDistributions`workflow, a full-length, UMI-deduplicated cDNA size distribution is obtained from the `valid_3p_valid_5p` barcoded BAM file (representing ***in vivo* full-length mRNAs**) for each sample. Next, the reference mRNA size distribution is constructed for all the samples using the `LongCreateReferenceSizeDistribution` workflow.
|
|
321
|
+
|
|
322
|
+
<p align="center">
|
|
323
|
+
<img src="doc/img/size-normalization-example.svg" width="850" height="412">
|
|
324
|
+
</p>
|
|
325
|
+
|
|
326
|
+
|
|
327
|
+
|
|
328
|
+
## *step 6)* Single-cell count module <a name="single-cell-count-module"></a>
|
|
329
|
+
|
|
330
|
+
```python
|
|
331
|
+
# run the single-cell count module
|
|
332
|
+
ourotools.LongExportNormalizedCountMatrix(
|
|
333
|
+
path_folder_ref = path_folder_count_module_index,
|
|
334
|
+
l_path_file_bam_input = l_full_length_bam,
|
|
335
|
+
l_path_folder_output = list( f'{path_folder_data}LongExportNormalizedCountMatrix_out/{name_sample}/' for name_sample in l_name_sample ),
|
|
336
|
+
l_name_distribution = l_name_sample,
|
|
337
|
+
path_folder_reference_distribution = path_folder_size_norm,
|
|
338
|
+
l_str_l_t_distribution_range_of_interest = [ ','.join( [ "raw", str_confident_size_range ] ) ],
|
|
339
|
+
flag_enforce_transcript_start_site_matching_for_long_read_during_realignment = True,
|
|
340
|
+
flag_enforce_transcript_end_site_matching_for_long_read_during_realignment = True,
|
|
341
|
+
)
|
|
342
|
+
```
|
|
343
|
+
|
|
344
|
+
The single-cell long-read count module `LongExportNormalizedCountMatrix` is largely composed of three parts: [constructing an index](#count-module-index) (only required once for each set of gene, transcript, repeat elements, and regulatory element annotations and the reference genome), assigning each read to various `buckets` (each `bucket` represent one of genes, transcripts, exons, splice junctions, TE, tCRE, and individual genomic tiles), and exporting a size distribution-normalized count matrix for each `bucket` (later these count matrixes are combined into a single size distribution-normalized count matrix as an output).
|
|
345
|
+
|
|
346
|
+
|
|
347
|
+
|
|
348
|
+
## *step 7)* Visualization <a name="visualization"></a>
|
|
349
|
+
|
|
350
|
+
```python
|
|
351
|
+
# TBD
|
|
352
|
+
```
|
|
353
|
+
|
|
354
|
+
|
|
355
|
+
|
|
356
|
+
## *wrap-up)* Running the entire pipeline using a wrapper function<a name="run-entire-pipeline"></a>
|
|
357
|
+
|
|
358
|
+
```python
|
|
359
|
+
# version 2024-08-09 19:02:27 by Hyunsu An @ GIST-FGL
|
|
360
|
+
import ourotools
|
|
361
|
+
|
|
362
|
+
ourotools.run_pipeline(
|
|
363
|
+
# dataset setting
|
|
364
|
+
path_folder_data = '/home/project/Single_Cell_Full_Length_Atlas/data/pipeline/20220331_Ouroboros_Project/pipeline/20230208_Mouse_Long_Read_Single_Cell_Atlas/pipeline/20230811_mouse_long_read_single_cell_atlas_v202308/tutorial_data/20240813_ovary_testis_tutorial2/',
|
|
365
|
+
l_name_sample = [
|
|
366
|
+
'mOvary.subsampled',
|
|
367
|
+
'mTestis2.subsampled',
|
|
368
|
+
],
|
|
369
|
+
# scRNA-seq technology-specific
|
|
370
|
+
path_file_valid_barcode_list = '/home/project/Single_Cell_Full_Length_Atlas/data/pipeline/20210728_development_ouroboros_qc/example/3M-february-2018.txt.gz', # GEX v3 CB
|
|
371
|
+
# species-specific settings
|
|
372
|
+
path_file_minimap_index_genome = '/home/shared/ensembl/Mus_musculus/index/minimap2/Mus_musculus.GRCm38.dna.primary_assembly.k_14.idx',
|
|
373
|
+
path_file_minimap_splice_junction = '/home/shared/ensembl/Mus_musculus/Mus_musculus.GRCm38.102.paftools.bed',
|
|
374
|
+
path_file_minimap_unwanted = '/home/project/Single_Cell_Full_Length_Atlas/data/accessory_data/cDNA_depletion/index/minimap2/MT_and_rRNA_GRCm38.fa.ont.mmi',
|
|
375
|
+
path_folder_count_module_index = '/home/project/Single_Cell_Full_Length_Atlas/data/pipeline/20211116_ouroboros_short_read_public_data_mining/scarab_annotations/Mus_musculus.GRCm38.102.v0.2.4/', # path to the Ouro-Tools reference
|
|
376
|
+
# run setting
|
|
377
|
+
n_workers = 2, # employ 2 workers (since there are two samples, 2 workers are sufficient)
|
|
378
|
+
n_threads_for_each_worker = 8, # use 8 CPU cores for each worker
|
|
379
|
+
# additional settings
|
|
380
|
+
args = dict(
|
|
381
|
+
LongCreateReferenceSizeDistribution = dict(
|
|
382
|
+
float_max_ratio_to_arr_dist_guassian_filter_min_sigma_for_dynamic_gaussian_filter_selection = 2,
|
|
383
|
+
float_sigma_gaussian_filter_min = 8,
|
|
384
|
+
int_min_total_read_count_for_a_peak = 30 ,
|
|
385
|
+
),
|
|
386
|
+
LongExportNormalizedCountMatrix = dict(
|
|
387
|
+
flag_enforce_transcript_start_site_matching_for_long_read_during_realignment = True,
|
|
388
|
+
flag_enforce_transcript_end_site_matching_for_long_read_during_realignment = True,
|
|
389
|
+
),
|
|
390
|
+
),
|
|
391
|
+
)
|
|
392
|
+
```
|
|
393
|
+
|
|
394
|
+
|
|
395
|
+
|
|
396
|
+
## Pre-built indices of unwanted genomic sequences for pre-processing<a name="pre-built-unwanted-genomic-sequences"></a>
|
|
397
|
+
|
|
398
|
+
The pre-built indices of unwanted sequences can be downloaded using the following links:
|
|
399
|
+
|
|
400
|
+
*<u>Human (GRCh38)</u>* : [Minimap2-index-file](https://ouro-tools.s3.amazonaws.com/miscellaneous/MT_and_rRNA_GRCh38.fa.ont.mmi), [FASTA-file](https://ouro-tools.s3.amazonaws.com/miscellaneous/MT_and_rRNA_GRCh38.fa), [GTF-file](https://ouro-tools.s3.amazonaws.com/miscellaneous/MT_and_rRNA_GRCh38.gtf)
|
|
401
|
+
|
|
402
|
+
*<u>Mouse (GRCm38)</u>* : [Minimap2-index-file](https://ouro-tools.s3.amazonaws.com/miscellaneous/MT_and_rRNA_GRCm38.fa.ont.mmi), [FASTA-file](https://ouro-tools.s3.amazonaws.com/miscellaneous/MT_and_rRNA_GRCm38.fa), [GTF-file](https://ouro-tools.s3.amazonaws.com/miscellaneous/MT_and_rRNA_GRCm38.gtf)
|
|
403
|
+
|
|
404
|
+
|
|
405
|
+
|
|
406
|
+
## An Ouro-Tools count module index<a name="count-module-index"></a>
|
|
407
|
+
|
|
408
|
+
The single-cell count module of Ouro-Tools utilizes <u>genome, transcriptome, and gene annotations</u> to assign reads to **genes, isoforms, and genomic bins (tiles across the genome)**. The index building process is automatic; <u>there is no needs to run a separate command in order to build the index</u>. Once Ouro-Tools processes these information before analyzing an input BAM file(s), the program saves an index in order to load the information much faster next time.
|
|
409
|
+
|
|
410
|
+
We recommends using <u>***Ensembl*** reference genome, transcriptome, and gene annotations of the same version</u> (release number).
|
|
411
|
+
|
|
412
|
+
|
|
413
|
+
|
|
414
|
+
### Pre-built index <a name="pre-built-index"></a>
|
|
415
|
+
|
|
416
|
+
pre-built index can be downloaded using the following links (should be extracted to a folder using **tar -xf** command):
|
|
417
|
+
|
|
418
|
+
[*<u>Human (GRCh38, Ensembl version 105)</u>*](https://ouro-tools.s3.amazonaws.com/index/latest/Homo_sapiens.GRCh38.105.v0.2.4.tar)
|
|
419
|
+
|
|
420
|
+
[*<u>Mouse (GRCm39, Ensembl version 105)</u>*](https://ouro-tools.s3.amazonaws.com/index/latest/Mus_musculus.GRCm39.105.v0.2.4.tar)
|
|
421
|
+
|
|
422
|
+
[*<u>Mouse (GRCm38, Ensembl version 102)</u>*](https://ouro-tools.s3.amazonaws.com/index/latest/Mus_musculus.GRCm38.102.v0.2.4.tar)
|
|
423
|
+
|
|
424
|
+
[*<u>Zebrafish (GRCz11, Ensembl version 104)</u>*](https://ouro-tools.s3.amazonaws.com/index/latest/Danio_rerio.GRCz11.104.v0.2.4.tar)
|
|
425
|
+
|
|
426
|
+
[*<u>Thale cress (TAIR10, Ensembl Plant version 56)</u>*](https://ouro-tools.s3.amazonaws.com/index/latest/Arabidopsis_thaliana.TAIR10.56.v0.2.4.tar)
|
|
427
|
+
|
|
428
|
+
|
|
429
|
+
|
|
430
|
+
### Building index from scratch <a name="building-index"></a>
|
|
431
|
+
|
|
432
|
+
An Ouro-Tools index can be built on-the-fly from the input genome, transcriptome, and gene annotation files. For example, below are the list of files that were used for the pre-built Ouro-Tools index "<u>*[Human (GRCh38, Ensembl version 105)](https://ouro-tools.s3.amazonaws.com/index/latest/Homo_sapiens.GRCh38.105.v0.2.4.tar)*</u>".
|
|
433
|
+
|
|
434
|
+
|
|
435
|
+
|
|
436
|
+
*required annotations* (*Ensemble version 105*):
|
|
437
|
+
|
|
438
|
+
* **path_file_fa_genome** : https://ftp.ensembl.org/pub/release-105/fasta/homo_sapiens/dna/Homo_sapiens.GRCh38.dna.primary_assembly.fa.gz
|
|
439
|
+
* A genome FASTA file. Either gzipped or plain FASTA file can be accepted.
|
|
440
|
+
* **path_file_gtf_genome** : https://ftp.ensembl.org/pub/release-105/gtf/homo_sapiens/Homo_sapiens.GRCh38.105.gtf.gz
|
|
441
|
+
* A GTF file. Either gzipped or plain GTF file can be accepted. Currently GFF3 format files are not supported.
|
|
442
|
+
* Following arguments can be used to set attribute names for identifying gene and transcript annotations in its attributes column.
|
|
443
|
+
* str_name_gtf_attr_for_id_gene : (default: '**gene_id**')
|
|
444
|
+
* str_name_gtf_attr_for_name_gene : (default: '**gene_name**')
|
|
445
|
+
* str_name_gtf_attr_for_id_transcript : (default: '**transcript_id**')
|
|
446
|
+
* str_name_gtf_attr_for_name_transcript : (default: '**transcript_name**')
|
|
447
|
+
* An example of GTF annotation file for gene annotations:
|
|
448
|
+
|
|
449
|
+
```
|
|
450
|
+
1 ensembl_havana gene 1211340 1214153 . - . gene_id "ENSG00000186827"; gene_version "11"; gene_name "TNFRSF4"; gene_source "ensembl_havana"; gene_biotype "protein_coding";
|
|
451
|
+
1 ensembl_havana transcript 1211340 1214153 . - . gene_id "ENSG00000186827"; gene_version "11"; transcript_id "ENST00000379236"; transcript_version "4"; gene_name "TNFRSF4"; gene_source "ensembl_havana"; gene_biotype "protein_coding"; transcript_name "TNFRSF4-201"; transcript_source "ensembl_havana"; transcript_biotype "protein_coding"; tag "CCDS"; ccds_id "CCDS11"; tag "basic"; transcript_support_level "1 (assigned to previous version 3)";
|
|
452
|
+
1 ensembl_havana exon 1213983 1214153 . - . gene_id "ENSG00000186827"; gene_version "11"; transcript_id "ENST00000379236"; transcript_version "4"; exon_number "1"; gene_name "TNFRSF4"; gene_source "ensembl_havana"; gene_biotype "protein_coding"; transcript_name "TNFRSF4-201"; transcript_source "ensembl_havana"; transcript_biotype "protein_coding"; tag "CCDS"; ccds_id "CCDS11"; exon_id "ENSE00001832731"; exon_version "2"; tag "basic"; transcript_support_level "1 (assigned to previous version 3)";
|
|
453
|
+
```
|
|
454
|
+
|
|
455
|
+
* **path_file_fa_transcriptome** : https://ftp.ensembl.org/pub/release-105/fasta/homo_sapiens/cdna/Homo_sapiens.GRCh38.cdna.all.fa.gz
|
|
456
|
+
* A transcriptome FASTA file. Either gzipped or plain FASTA file can be accepted.
|
|
457
|
+
|
|
458
|
+
|
|
459
|
+
|
|
460
|
+
#### *optional input annotations* <a name="optional-input-annotations"></a>
|
|
461
|
+
|
|
462
|
+
* **path_file_tsv_repeatmasker_ucsc** : [Table Browser (ucsc.edu)](https://genome.ucsc.edu/cgi-bin/hgTables?hgsid=1576143313_LetmEyQf9yggiQJAXajCua4TGOGl&clade=mammal&org=Human&db=hg38&hgta_group=rep&hgta_track=knownGene&hgta_table=0&hgta_regionType=genome&position=chr2%3A25%2C160%2C915-25%2C168%2C903&hgta_outputType=primaryTable&hgta_outFileName=GRCh38_RepeatMasker.tsv.gz) [click "get output" to download the annotation]
|
|
463
|
+
|
|
464
|
+
* repeat masker annotations from the UCSC Table Browser
|
|
465
|
+
|
|
466
|
+
* **path_file_gff_regulatory_element** : https://ftp.ensembl.org/pub/current_regulation/homo_sapiens/homo_sapiens.GRCh38.Regulatory_Build.regulatory_features.20221007.gff.gz
|
|
467
|
+
|
|
468
|
+
* The latest regulatory build from **Ensembl**.
|
|
469
|
+
* Annotations from other sources, or custom annotations can be used. Currently only the GFF3 file format is supported (with **.gff** extension).
|
|
470
|
+
* The following argument can be used to set the attribute name for identifying regulatory region
|
|
471
|
+
* str_name_gff_attr_id_regulatory_element : (default: '**ID**')
|
|
472
|
+
|
|
473
|
+
* An example of GFF annotation file for regulatory elements:
|
|
474
|
+
|
|
475
|
+
```
|
|
476
|
+
18 Regulatory_Build enhancer 35116801 35120999 . . . ID=enhancer:ENSR00000572865;bound_end=35120999;bound_start=35116801;description=Predicted enhancer region;feature_type=Enhancer
|
|
477
|
+
8 Regulatory_Build TF_binding_site 37967115 37967453 . . . ID=TF_binding_site:ENSR00001137252;bound_end=37967531;bound_start=37966339;description=Transcription factor binding site;feature_typ
|
|
478
|
+
6 Regulatory_Build enhancer 90249202 90257999 . . . ID=enhancer:ENSR00000798348;bound_end=90257999;bound_start=90249202;description=Predicted enhancer region;feature_type=Enhancer
|
|
479
|
+
3 Regulatory_Build CTCF_binding_site 57689401 57689600 . . . ID=CTCF_binding_site:ENSR00000687477;bound_end=57689600;bound_start=57689401;description=CTCF binding site;feature_type=CTCF
|
|
480
|
+
```
|
|
481
|
+
|
|
482
|
+
|
|
483
|
+
|
|
484
|
+
|
|
485
|
+
|
|
486
|
+
## SAM Tags <a name="SAM-tags"></a>
|
|
487
|
+
|
|
488
|
+
| *SAM tag name* | *data type* | *Description* | *Module name* |
|
|
489
|
+
| -------------- | ----------- | ------------------------------------------------------------ | --------------------------- |
|
|
490
|
+
| *CB* | Z | the corrected cell barcode sequence | Barcode Extraction |
|
|
491
|
+
| *UB* | Z | the corrected UMI sequence after the UMI clustering process | Barcode Extraction |
|
|
492
|
+
| *UR* | Z | the uncorrected UMI sequence before the UMI clustering process | Barcode Extraction |
|
|
493
|
+
| *XR* | i | the number of errors for identification of R1 adapter (marks the 3’ end of cDNA). -1 indicates that the adapter was not identified | Barcode Extraction |
|
|
494
|
+
| *XT* | i | the number of errors for identification of TSO adapter (marks the 5’ end of cDNA). -1 indicates that the adapter was not identified | Barcode Extraction |
|
|
495
|
+
| *CU* | Z | the uncorrected raw CB-UMI sequence | Barcode Extraction |
|
|
496
|
+
| *IA* | i | the length of detected internal poly(A) tract on the genome | Barcode Extraction |
|
|
497
|
+
| *LE* | i | the total number of genome-aligned base pairs | Barcode Extraction |
|
|
498
|
+
| *AG* | i | the number of consecutive G nucleotides, starting from the 5’ site in the aligned region of the read | Full-Length Identification |
|
|
499
|
+
| *UG* | i | the number of consecutive G nucleotides, starting from the 5’ site in the unaligned region of the read (soft-clipped sequence) | Full-Length Identification |
|
|
500
|
+
| *VS* | i | “1” if the 5’ site is identified as a valid transcript start site (TSS), “0” if the 5’ site is identified as an invalid TSS, representing 5’ sites of the PCR/RT artifacts (including 5p degradation products of full-length transcripts) | Full-Length Identification |
|
|
501
|
+
| *AU* | i | the inferred number of unreferenced G nucleotides aligned to the genome | Full-Length Identification |
|
|
502
|
+
| *XC* | i | the bitwise flags (see the table below for more details) | Single-Cell Count |
|
|
503
|
+
| *XR* | Z | the repeat element ID | Single-Cell Count |
|
|
504
|
+
| *YR* | i | the total number of base pairs overlapping with the repeat element to which the read is confidently assigned | Single-Cell Count |
|
|
505
|
+
| *XG* | Z | the gene ID | Single-Cell Count |
|
|
506
|
+
| *YG* | i | the total number of base pairs overlapping with the exons of the gene to which the read is confidently assigned | Single-Cell Count |
|
|
507
|
+
| *XP* | Z | the promoter ID | Single-Cell Count |
|
|
508
|
+
| *YX* | i | the total number of base pairs overlapping with any exons that overlap with the read | Single-Cell Count |
|
|
509
|
+
| *YF* | i | the total number of base pairs overlapping with any repeat elements that overlap with the read (considering only filtered repeat elements) | Single-Cell Count |
|
|
510
|
+
| *XE* | Z | the regulatory element ID | Single-Cell Count |
|
|
511
|
+
| *YU* | i | the total number of base pairs overlapping with any repeat elements that overlap with the read (considering all repeat elements) | Single-Cell Count |
|
|
512
|
+
| *YE* | i | the total number of base pairs overlapping with any regulatory elements that overlap with the read | Single-Cell Count |
|
|
513
|
+
| *XT* | Z | the transcript ID that is uniquely assigned to the read using the re-alignment process | Single-Cell Count |
|
|
514
|
+
| *ZF* | i | the flag that indicates the read represents a full-length cDNA with valid 3’ and 5’ ends | Single-Cell Count |
|
|
515
|
+
|
|
516
|
+
|
|
517
|
+
|
|
518
|
+
## Bitwise flags <a name="bitwise-flags"></a>
|
|
519
|
+
|
|
520
|
+
| *Binary flag* | *Feature type* | *Description* |
|
|
521
|
+
| ------------- | -------------- | ------------------------------------------------------------ |
|
|
522
|
+
| 0x1 | gene | overlaps with gene(s) |
|
|
523
|
+
| 0x2 | gene | gene assignment is ambiguous |
|
|
524
|
+
| 0x4 | gene | completely intronic reads (GEX mode specific) |
|
|
525
|
+
| 0x8 | gene | exonic reads (GEX mode specific) |
|
|
526
|
+
| 0x10 | promoter | overlaps with promoter region(s) (ATAC mode specific) |
|
|
527
|
+
| 0x20 | promoter | promoter assignment is ambiguous (ATAC mode specific) |
|
|
528
|
+
| 0x40 | repeats | overlaps with repeat element(s) |
|
|
529
|
+
| 0x80 | repeats | ambiguous assignment to two or more number of repeat elements |
|
|
530
|
+
| 0x100 | repeats | the entire length of a read overlaps with a single repeat element |
|
|
531
|
+
| 0x200 | regulatory | overlaps with regulatory element(s) |
|
|
532
|
+
| 0x400 | regulatory | overlaps with both repeat element(s) and regulatory element(s) |
|
|
533
|
+
| 0x800 | regulatory | overlaps exclusively with regulatory element(s) (no overlaps with repeat element) |
|
|
534
|
+
| 0x1000 | regulatory | ambiguous assignment to two or more number of regulatory elements |
|
|
535
|
+
| 0x2000 | regulatory | the entire length of a read overlaps with a single regulatory element |
|
|
536
|
+
|
|
537
|
+
|
|
538
|
+
|
|
539
|
+
|
|
540
|
+
|
|
541
|
+
---------------
|
|
542
|
+
|
|
543
|
+
Ouro-Tools was developed by Hyunsu An and Chaemin Lim at Gwangju Institute of Science and Technology under the supervision of Professor Jihwan Park.
|
|
544
|
+
|
|
545
|
+
© 2024 Functional Genomics Lab, Gwangju Institute of Science and Technology
|