offtracker 2.7.8__zip → 2.7.10__zip
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- offtracker-2.7.10/PKG-INFO +189 -0
- offtracker-2.7.10/README.md +177 -0
- {offtracker-2.7.8 → offtracker-2.7.10}/offtracker/X_offplot.py +25 -7
- offtracker-2.7.10/offtracker/_version.py +30 -0
- offtracker-2.7.10/offtracker.egg-info/PKG-INFO +189 -0
- {offtracker-2.7.8 → offtracker-2.7.10}/offtracker.egg-info/SOURCES.txt +2 -1
- {offtracker-2.7.8 → offtracker-2.7.10}/scripts/offtracker_analysis.py +12 -4
- offtracker-2.7.10/scripts/offtracker_plot.py +39 -0
- {offtracker-2.7.8 → offtracker-2.7.10}/setup.py +4 -1
- offtracker-2.7.8/PKG-INFO +0 -146
- offtracker-2.7.8/README.md +0 -134
- offtracker-2.7.8/offtracker/_version.py +0 -28
- offtracker-2.7.8/offtracker.egg-info/PKG-INFO +0 -146
- {offtracker-2.7.8 → offtracker-2.7.10}/LICENSE.txt +0 -0
- {offtracker-2.7.8 → offtracker-2.7.10}/MANIFEST.in +0 -0
- {offtracker-2.7.8 → offtracker-2.7.10}/offtracker/X_offtracker.py +0 -0
- {offtracker-2.7.8 → offtracker-2.7.10}/offtracker/X_sequence.py +0 -0
- {offtracker-2.7.8 → offtracker-2.7.10}/offtracker/__init__.py +0 -0
- {offtracker-2.7.8 → offtracker-2.7.10}/offtracker/mapping/1.1_bed2fr_v4.5.py +0 -0
- {offtracker-2.7.8 → offtracker-2.7.10}/offtracker/mapping/1.3_bdg_normalize_v4.0.py +0 -0
- {offtracker-2.7.8 → offtracker-2.7.10}/offtracker/mapping/Snakefile_offtracker +0 -0
- {offtracker-2.7.8 → offtracker-2.7.10}/offtracker/mapping/bedGraphToBigWig +0 -0
- {offtracker-2.7.8 → offtracker-2.7.10}/offtracker/mapping/hg38.chrom.sizes +0 -0
- {offtracker-2.7.8 → offtracker-2.7.10}/offtracker/mapping/mm10.chrom.sizes +0 -0
- {offtracker-2.7.8 → offtracker-2.7.10}/offtracker/mapping/offtracker_blacklist_hg38.merged.bed +0 -0
- {offtracker-2.7.8 → offtracker-2.7.10}/offtracker/mapping/offtracker_blacklist_mm10.merged.bed +0 -0
- {offtracker-2.7.8 → offtracker-2.7.10}/offtracker.egg-info/dependency_links.txt +0 -0
- {offtracker-2.7.8 → offtracker-2.7.10}/offtracker.egg-info/requires.txt +0 -0
- {offtracker-2.7.8 → offtracker-2.7.10}/offtracker.egg-info/top_level.txt +0 -0
- {offtracker-2.7.8 → offtracker-2.7.10}/scripts/offtracker_candidates.py +0 -0
- {offtracker-2.7.8 → offtracker-2.7.10}/scripts/offtracker_config.py +0 -0
- {offtracker-2.7.8 → offtracker-2.7.10}/setup.cfg +0 -0
@@ -0,0 +1,189 @@
|
|
1
|
+
Metadata-Version: 2.1
|
2
|
+
Name: offtracker
|
3
|
+
Version: 2.7.10
|
4
|
+
Summary: Tracking-seq data analysis
|
5
|
+
Home-page: https://github.com/Lan-lab/offtracker
|
6
|
+
Author: Runda Xu
|
7
|
+
Author-email: runda.xu@foxmail.com
|
8
|
+
Requires-Python: >=3.6.0
|
9
|
+
Description-Content-Type: text/markdown
|
10
|
+
License-File: LICENSE.txt
|
11
|
+
|
12
|
+
|
13
|
+
# OFF-TRACKER
|
14
|
+
|
15
|
+
OFF-TRACKER is an end to end pipeline of Tracking-seq data analysis for detecting off-target sites of any genome editing tools that generate double-strand breaks (DSBs) or single-strand breaks (SSBs).
|
16
|
+
|
17
|
+
## System requirements
|
18
|
+
|
19
|
+
* Linux/Unix
|
20
|
+
* Python >= 3.6
|
21
|
+
|
22
|
+
## Dependency
|
23
|
+
|
24
|
+
```bash
|
25
|
+
# We recommend creating a new enviroment using mamba/conda to avoid compatibility problems
|
26
|
+
# If you don't use mamba, just replace the code with conda
|
27
|
+
mamba create -n offtracker -c bioconda blast snakemake pybedtools
|
28
|
+
```
|
29
|
+
|
30
|
+
|
31
|
+
## Installation
|
32
|
+
|
33
|
+
```bash
|
34
|
+
# Activate the environment
|
35
|
+
conda activate offtracker
|
36
|
+
|
37
|
+
# Direct installation with pip
|
38
|
+
pip install offtracker
|
39
|
+
|
40
|
+
# (Alternative) Download the offtracker from github
|
41
|
+
git clone https://github.com/Lan-lab/offtracker.git
|
42
|
+
cd offtracker
|
43
|
+
pip install .
|
44
|
+
```
|
45
|
+
|
46
|
+
|
47
|
+
## Before analyzing samples
|
48
|
+
|
49
|
+
```bash
|
50
|
+
# Build blast index (only need once for each genome)
|
51
|
+
makeblastdb -input_type fasta -title hg38 -dbtype nucl -parse_seqids \
|
52
|
+
-in /Your_Path_To_Reference/hg38_genome.fa \
|
53
|
+
-out /Your_Path_To_Reference/hg38_genome.blastdb \
|
54
|
+
-logfile /Your_Path_To_Reference/hg38_genome.blastdb.log
|
55
|
+
|
56
|
+
# Build chromap index (only need once for each genome)
|
57
|
+
chromap -i -r /Your_Path_To_Reference/hg38_genome.fa \
|
58
|
+
-o /Your_Path_To_Reference/hg38_genome.chromap.index
|
59
|
+
|
60
|
+
# Generate candidate regions by sgRNA sequence (need once for each genome and sgRNA)
|
61
|
+
# --name: the name of the sgRNA, which will be used in the following analysis
|
62
|
+
offtracker_candidates.py -t 8 -g hg38 \
|
63
|
+
-r /Your_Path_To_Reference/hg38_genome.fa \
|
64
|
+
-b /Your_Path_To_Reference/hg38_genome.blastdb \
|
65
|
+
--name 'VEGFA2' --sgrna 'GACCCCCTCCACCCCGCCTC' --pam 'NGG' \
|
66
|
+
-o /Your_Path_To_Candidates
|
67
|
+
|
68
|
+
```
|
69
|
+
|
70
|
+
## Strand-specific mapping of Tracking-seq data
|
71
|
+
|
72
|
+
```bash
|
73
|
+
# Generate snakemake config file
|
74
|
+
# --subfolder: If different samples are in seperate folders, set this to 1
|
75
|
+
# if -o is not set, the output will be in the same folder as the fastq files
|
76
|
+
offtracker_config.py -t 8 -g hg38 --blacklist hg38 \
|
77
|
+
-r /Your_Path_To_Reference/hg38_genome.fa \
|
78
|
+
-i /Your_Path_To_Reference/hg38_genome.chromap.index \
|
79
|
+
-f /Your_Path_To_Fastq \
|
80
|
+
-o /Your_Path_To_Output \
|
81
|
+
--subfolder 0
|
82
|
+
|
83
|
+
# Run the snakemake program
|
84
|
+
cd /Your_Path_To_Fastq
|
85
|
+
snakemake -np # dry run
|
86
|
+
nohup snakemake --cores 16 1>snakemake.log 2>snakemake.err &
|
87
|
+
|
88
|
+
## about cores
|
89
|
+
# --cores of snakemake must be larger than -t of offtracker_config.py
|
90
|
+
# parallel number = cores/t
|
91
|
+
|
92
|
+
## about output
|
93
|
+
# This part will generate "*.fw.scaled.bw" and ".rv.scaled.bw" for IGV visualization
|
94
|
+
# "*.fw.bed" and "*.rv.bed" are used in the next part.
|
95
|
+
```
|
96
|
+
|
97
|
+
|
98
|
+
## Analyzing the genome-wide off-target sites
|
99
|
+
|
100
|
+
```bash
|
101
|
+
# In this part, multiple samples in the same condition can be analyzed in a single run by pattern recogonization of sample names
|
102
|
+
|
103
|
+
offtracker_analysis.py -g hg38 --name "VEGFA2" \
|
104
|
+
--exp 'Cas9_VEGFA2' \
|
105
|
+
--control 'WT' \
|
106
|
+
--outname 'Cas9_VEGFA_293' \
|
107
|
+
-f /Your_Path_To_Output \
|
108
|
+
--seqfolder /Your_Path_To_Candidates
|
109
|
+
|
110
|
+
# --name: the same gRNA name you set when running offtracker_candidates.py
|
111
|
+
# --exp/--control: add one or multiple patterns of file name in regular expressions
|
112
|
+
# If multiple samples meet the pattern, their signals will be averaged. Thus, only samples with the same condition should be included in a single analysis.
|
113
|
+
|
114
|
+
# This step will generate Offtracker_result_{outname}.csv
|
115
|
+
# Default FDR is 0.05, which can be changed by --fdr. This will empirically make the threshold of Track score around 2.
|
116
|
+
# Sites with Track score >=2, which is a empirical threshold, are output regardless of FDR.
|
117
|
+
# Intermediate files are saved in ./temp folder, which can be deleted.
|
118
|
+
# Keeping the intermediate files can make the analysis faster if involving previously analyzed samples (e.g. using the same control samples for different analyses)
|
119
|
+
```
|
120
|
+
|
121
|
+
## Off-target sequences visualization
|
122
|
+
|
123
|
+
```bash
|
124
|
+
# After get the Offtracker_result_{outname}.csv, you can visualize the off-target sites with their genomic sequence with the following command:
|
125
|
+
|
126
|
+
offtracker_plot.py --result Your_Offtracker_Result_CSV \
|
127
|
+
--sgrna 'GACCCCCTCCACCCCGCCTC' --pam 'NGG'
|
128
|
+
|
129
|
+
# The default output is a pdf file with Offtracker_result_{outname}.pdf
|
130
|
+
# Change the suffix of the output file to change the format (e.g.: .png)
|
131
|
+
# The orange dash line indicates the empirical threshold of Track score = 2
|
132
|
+
# Empirically, the off-target sites with Track score < 2 are less likely to be real off-target sites.
|
133
|
+
```
|
134
|
+
|
135
|
+
|
136
|
+
## Note1
|
137
|
+
|
138
|
+
The default setting only includes chr1-chr22, chrX, chrY, and chrM. Please make sure the reference genome contains "chr" at the beginning.
|
139
|
+
|
140
|
+
Currently, this software is only ready-to-use for mm10 and hg38. For any other genome, e.g., hg19, please add genome size file named "hg19.chrom.sizes" to .\offtracker\mapping and instal manually. Besides, add "--blacklist none" or "--blacklist Your_Blacklist" (e.g., ENCODE blacklist) when running offtracker_config.py, because we only provide blacklists for mm10 and hg38.
|
141
|
+
|
142
|
+
If you have a requirement for species other than human/mouse, please post an issue.
|
143
|
+
|
144
|
+
## Note2
|
145
|
+
|
146
|
+
The FDRs in the Tracking-seq result do not reflect the real off-target probability.
|
147
|
+
It is strongly recommended to observe the "fw.scaled.bw" and "rv.scaled.bw" using genome browser like IGV to visually inspect each target location from the Tracking-seq result.
|
148
|
+
|
149
|
+
|
150
|
+
|
151
|
+
# Example Data
|
152
|
+
|
153
|
+
Here are example data that contains reads of chr6 from HEK293T cells edited with Cas9 + sgRNA VEGFA2 and wild type cells:
|
154
|
+
|
155
|
+
https://figshare.com/articles/dataset/WT_HEK239T_chr6/25956034
|
156
|
+
|
157
|
+
It takes about 5-10 minutes to run the mapping (offtracker_config.py & snakemake) of example data with -t 8 and --cores 16 (2 parallel tasks)
|
158
|
+
|
159
|
+
## Signal visualization
|
160
|
+
|
161
|
+
After mapping, there will be 4 .bw files in the output folder:
|
162
|
+
```bash
|
163
|
+
Cas9_VEGFA2_chr6.fw.scaled.bw
|
164
|
+
|
165
|
+
Cas9_VEGFA2_chr6.rv.scaled.bw
|
166
|
+
|
167
|
+
WT_chr6.fw.scaled.bw
|
168
|
+
|
169
|
+
WT_chr6.rv.scaled.bw
|
170
|
+
```
|
171
|
+
These files can be visualized in genome browser like IGV:
|
172
|
+
|
173
|
+

|
174
|
+
|
175
|
+
|
176
|
+
## Whole genome off-target analysis
|
177
|
+
|
178
|
+
For analyzing the signals (offtracker_analysis.py), it takes about 3-5 minutes and outputs a file named "Offtracker_result_{outname}.csv"
|
179
|
+
|
180
|
+
After that, you can visualize the off-target sites with their genomic sequence (offtracker_plot.py) and get an image like this:
|
181
|
+
|
182
|
+

|
183
|
+
|
184
|
+
# Citation
|
185
|
+
|
186
|
+
|
187
|
+
|
188
|
+
|
189
|
+
|
@@ -0,0 +1,177 @@
|
|
1
|
+
# OFF-TRACKER
|
2
|
+
|
3
|
+
OFF-TRACKER is an end to end pipeline of Tracking-seq data analysis for detecting off-target sites of any genome editing tools that generate double-strand breaks (DSBs) or single-strand breaks (SSBs).
|
4
|
+
|
5
|
+
## System requirements
|
6
|
+
|
7
|
+
* Linux/Unix
|
8
|
+
* Python >= 3.6
|
9
|
+
|
10
|
+
## Dependency
|
11
|
+
|
12
|
+
```bash
|
13
|
+
# We recommend creating a new enviroment using mamba/conda to avoid compatibility problems
|
14
|
+
# If you don't use mamba, just replace the code with conda
|
15
|
+
mamba create -n offtracker -c bioconda blast snakemake pybedtools
|
16
|
+
```
|
17
|
+
|
18
|
+
|
19
|
+
## Installation
|
20
|
+
|
21
|
+
```bash
|
22
|
+
# Activate the environment
|
23
|
+
conda activate offtracker
|
24
|
+
|
25
|
+
# Direct installation with pip
|
26
|
+
pip install offtracker
|
27
|
+
|
28
|
+
# (Alternative) Download the offtracker from github
|
29
|
+
git clone https://github.com/Lan-lab/offtracker.git
|
30
|
+
cd offtracker
|
31
|
+
pip install .
|
32
|
+
```
|
33
|
+
|
34
|
+
|
35
|
+
## Before analyzing samples
|
36
|
+
|
37
|
+
```bash
|
38
|
+
# Build blast index (only need once for each genome)
|
39
|
+
makeblastdb -input_type fasta -title hg38 -dbtype nucl -parse_seqids \
|
40
|
+
-in /Your_Path_To_Reference/hg38_genome.fa \
|
41
|
+
-out /Your_Path_To_Reference/hg38_genome.blastdb \
|
42
|
+
-logfile /Your_Path_To_Reference/hg38_genome.blastdb.log
|
43
|
+
|
44
|
+
# Build chromap index (only need once for each genome)
|
45
|
+
chromap -i -r /Your_Path_To_Reference/hg38_genome.fa \
|
46
|
+
-o /Your_Path_To_Reference/hg38_genome.chromap.index
|
47
|
+
|
48
|
+
# Generate candidate regions by sgRNA sequence (need once for each genome and sgRNA)
|
49
|
+
# --name: the name of the sgRNA, which will be used in the following analysis
|
50
|
+
offtracker_candidates.py -t 8 -g hg38 \
|
51
|
+
-r /Your_Path_To_Reference/hg38_genome.fa \
|
52
|
+
-b /Your_Path_To_Reference/hg38_genome.blastdb \
|
53
|
+
--name 'VEGFA2' --sgrna 'GACCCCCTCCACCCCGCCTC' --pam 'NGG' \
|
54
|
+
-o /Your_Path_To_Candidates
|
55
|
+
|
56
|
+
```
|
57
|
+
|
58
|
+
## Strand-specific mapping of Tracking-seq data
|
59
|
+
|
60
|
+
```bash
|
61
|
+
# Generate snakemake config file
|
62
|
+
# --subfolder: If different samples are in seperate folders, set this to 1
|
63
|
+
# if -o is not set, the output will be in the same folder as the fastq files
|
64
|
+
offtracker_config.py -t 8 -g hg38 --blacklist hg38 \
|
65
|
+
-r /Your_Path_To_Reference/hg38_genome.fa \
|
66
|
+
-i /Your_Path_To_Reference/hg38_genome.chromap.index \
|
67
|
+
-f /Your_Path_To_Fastq \
|
68
|
+
-o /Your_Path_To_Output \
|
69
|
+
--subfolder 0
|
70
|
+
|
71
|
+
# Run the snakemake program
|
72
|
+
cd /Your_Path_To_Fastq
|
73
|
+
snakemake -np # dry run
|
74
|
+
nohup snakemake --cores 16 1>snakemake.log 2>snakemake.err &
|
75
|
+
|
76
|
+
## about cores
|
77
|
+
# --cores of snakemake must be larger than -t of offtracker_config.py
|
78
|
+
# parallel number = cores/t
|
79
|
+
|
80
|
+
## about output
|
81
|
+
# This part will generate "*.fw.scaled.bw" and ".rv.scaled.bw" for IGV visualization
|
82
|
+
# "*.fw.bed" and "*.rv.bed" are used in the next part.
|
83
|
+
```
|
84
|
+
|
85
|
+
|
86
|
+
## Analyzing the genome-wide off-target sites
|
87
|
+
|
88
|
+
```bash
|
89
|
+
# In this part, multiple samples in the same condition can be analyzed in a single run by pattern recogonization of sample names
|
90
|
+
|
91
|
+
offtracker_analysis.py -g hg38 --name "VEGFA2" \
|
92
|
+
--exp 'Cas9_VEGFA2' \
|
93
|
+
--control 'WT' \
|
94
|
+
--outname 'Cas9_VEGFA_293' \
|
95
|
+
-f /Your_Path_To_Output \
|
96
|
+
--seqfolder /Your_Path_To_Candidates
|
97
|
+
|
98
|
+
# --name: the same gRNA name you set when running offtracker_candidates.py
|
99
|
+
# --exp/--control: add one or multiple patterns of file name in regular expressions
|
100
|
+
# If multiple samples meet the pattern, their signals will be averaged. Thus, only samples with the same condition should be included in a single analysis.
|
101
|
+
|
102
|
+
# This step will generate Offtracker_result_{outname}.csv
|
103
|
+
# Default FDR is 0.05, which can be changed by --fdr. This will empirically make the threshold of Track score around 2.
|
104
|
+
# Sites with Track score >=2, which is a empirical threshold, are output regardless of FDR.
|
105
|
+
# Intermediate files are saved in ./temp folder, which can be deleted.
|
106
|
+
# Keeping the intermediate files can make the analysis faster if involving previously analyzed samples (e.g. using the same control samples for different analyses)
|
107
|
+
```
|
108
|
+
|
109
|
+
## Off-target sequences visualization
|
110
|
+
|
111
|
+
```bash
|
112
|
+
# After get the Offtracker_result_{outname}.csv, you can visualize the off-target sites with their genomic sequence with the following command:
|
113
|
+
|
114
|
+
offtracker_plot.py --result Your_Offtracker_Result_CSV \
|
115
|
+
--sgrna 'GACCCCCTCCACCCCGCCTC' --pam 'NGG'
|
116
|
+
|
117
|
+
# The default output is a pdf file with Offtracker_result_{outname}.pdf
|
118
|
+
# Change the suffix of the output file to change the format (e.g.: .png)
|
119
|
+
# The orange dash line indicates the empirical threshold of Track score = 2
|
120
|
+
# Empirically, the off-target sites with Track score < 2 are less likely to be real off-target sites.
|
121
|
+
```
|
122
|
+
|
123
|
+
|
124
|
+
## Note1
|
125
|
+
|
126
|
+
The default setting only includes chr1-chr22, chrX, chrY, and chrM. Please make sure the reference genome contains "chr" at the beginning.
|
127
|
+
|
128
|
+
Currently, this software is only ready-to-use for mm10 and hg38. For any other genome, e.g., hg19, please add genome size file named "hg19.chrom.sizes" to .\offtracker\mapping and instal manually. Besides, add "--blacklist none" or "--blacklist Your_Blacklist" (e.g., ENCODE blacklist) when running offtracker_config.py, because we only provide blacklists for mm10 and hg38.
|
129
|
+
|
130
|
+
If you have a requirement for species other than human/mouse, please post an issue.
|
131
|
+
|
132
|
+
## Note2
|
133
|
+
|
134
|
+
The FDRs in the Tracking-seq result do not reflect the real off-target probability.
|
135
|
+
It is strongly recommended to observe the "fw.scaled.bw" and "rv.scaled.bw" using genome browser like IGV to visually inspect each target location from the Tracking-seq result.
|
136
|
+
|
137
|
+
|
138
|
+
|
139
|
+
# Example Data
|
140
|
+
|
141
|
+
Here are example data that contains reads of chr6 from HEK293T cells edited with Cas9 + sgRNA VEGFA2 and wild type cells:
|
142
|
+
|
143
|
+
https://figshare.com/articles/dataset/WT_HEK239T_chr6/25956034
|
144
|
+
|
145
|
+
It takes about 5-10 minutes to run the mapping (offtracker_config.py & snakemake) of example data with -t 8 and --cores 16 (2 parallel tasks)
|
146
|
+
|
147
|
+
## Signal visualization
|
148
|
+
|
149
|
+
After mapping, there will be 4 .bw files in the output folder:
|
150
|
+
```bash
|
151
|
+
Cas9_VEGFA2_chr6.fw.scaled.bw
|
152
|
+
|
153
|
+
Cas9_VEGFA2_chr6.rv.scaled.bw
|
154
|
+
|
155
|
+
WT_chr6.fw.scaled.bw
|
156
|
+
|
157
|
+
WT_chr6.rv.scaled.bw
|
158
|
+
```
|
159
|
+
These files can be visualized in genome browser like IGV:
|
160
|
+
|
161
|
+

|
162
|
+
|
163
|
+
|
164
|
+
## Whole genome off-target analysis
|
165
|
+
|
166
|
+
For analyzing the signals (offtracker_analysis.py), it takes about 3-5 minutes and outputs a file named "Offtracker_result_{outname}.csv"
|
167
|
+
|
168
|
+
After that, you can visualize the off-target sites with their genomic sequence (offtracker_plot.py) and get an image like this:
|
169
|
+
|
170
|
+

|
171
|
+
|
172
|
+
# Citation
|
173
|
+
|
174
|
+
|
175
|
+
|
176
|
+
|
177
|
+
|
@@ -1,13 +1,22 @@
|
|
1
|
-
|
2
|
-
import matplotlib.patches as patches
|
1
|
+
|
3
2
|
import pandas as pd
|
4
3
|
import numpy as np
|
5
|
-
|
6
|
-
|
4
|
+
import matplotlib.pyplot as plt
|
5
|
+
import matplotlib.patches as patches
|
6
|
+
from matplotlib import rcParams
|
7
|
+
# 和用 plt.rcParams or matplotlib.rcParams 是一样的
|
8
|
+
dict_rc = {
|
9
|
+
'pdf.fonttype': 42,
|
10
|
+
'font.family': ['Arial']
|
11
|
+
}
|
12
|
+
rcParams.update(dict_rc)
|
13
|
+
|
14
|
+
# 2024.06.03. offtable 添加 threshold 分界线,默认为 None,常用的是 2
|
15
|
+
def offtable(offtargets, target_guide, length_pam = 3,
|
7
16
|
col_seq='best_target', col_score='track_score', col_mismatch='mismatch', col_loc='target_location',
|
8
17
|
title=None, font='Arial', font_size=9,
|
9
|
-
box_size_x=15, box_size_y=20, box_gap=1,
|
10
|
-
x_offset=15, y_offset=35, dpi=
|
18
|
+
box_size_x=15, box_size_y=20, box_gap=1, threshold=None,
|
19
|
+
x_offset=15, y_offset=35, dpi=300, savefig=None):
|
11
20
|
# Facecolor
|
12
21
|
color_dict = {
|
13
22
|
'A': 'lightgreen',
|
@@ -21,6 +30,8 @@ def offtable(offtargets, target_guide,
|
|
21
30
|
|
22
31
|
# If offtargets is a DataFrame, convert to list of dictionaries
|
23
32
|
if isinstance(offtargets, pd.DataFrame):
|
33
|
+
if threshold is not None:
|
34
|
+
n_positive = sum(offtargets[col_score]>=threshold)
|
24
35
|
offtargets = offtargets.to_dict(orient='records')
|
25
36
|
|
26
37
|
# Configuration
|
@@ -100,11 +111,18 @@ def offtable(offtargets, target_guide,
|
|
100
111
|
ax.text(x_offset + (len(target_guide) + 4) * box_size_x, y + box_size_y / 2, seq[col_loc], ha='left', va='center', family=font, fontsize=font_size)
|
101
112
|
|
102
113
|
# add a vertical line to indicate the PAM
|
103
|
-
x_line = x_offset + (len(target_guide) -
|
114
|
+
x_line = x_offset + (len(target_guide) - length_pam) * box_size_x
|
104
115
|
y_start = y_offset # + box_size_y / 2
|
105
116
|
y_end = y_start + (len(offtargets)+1) * (box_size_y + box_gap)
|
106
117
|
ax.vlines(x=x_line, ymin=y_start, ymax=y_end, color='indianred', linestyle='--')
|
107
118
|
|
119
|
+
# 2024.06.03. add a horizontal line to indicate the threshold
|
120
|
+
if threshold is not None:
|
121
|
+
thresh_x_start = x_offset
|
122
|
+
thresh_x_end = x_offset + len(target_guide) * box_size_x
|
123
|
+
thresh_y = y_offset + (n_positive+1) * (box_size_y + box_gap) - box_gap*0.5
|
124
|
+
ax.hlines(y=thresh_y, xmin=thresh_x_start, xmax=thresh_x_end, color='orange', linestyle='--')
|
125
|
+
|
108
126
|
# Styling and save
|
109
127
|
ax.set_xlim(0, width*1.1) # location 的文字太长了,所以要加长一点
|
110
128
|
ax.set_ylim(height, 0)
|
@@ -0,0 +1,30 @@
|
|
1
|
+
__version__ = "2.7.10"
|
2
|
+
# 2023.08.11. v1.1.0 adding a option for not normalizing the bw file
|
3
|
+
# 2023.10.26. v1.9.0 prerelease for v2.0
|
4
|
+
# 2023.10.27. v2.0.0 大更新,还没微调
|
5
|
+
# 2023.10.28. v2.1.0 修复bug,增加计算信号长度的功能
|
6
|
+
# 2023.10.28. v2.2.0 修复bug,改变计算信号长度的算法
|
7
|
+
# 2023.10.29. v2.3.0 增加 overall signal 计算
|
8
|
+
# 2023.11.01. v2.3.1 增加 signal_only 选项
|
9
|
+
# 2023.11.02. v2.3.2 修改 sample signal 和 group mean 的计算顺序
|
10
|
+
# 2023.11.04. v2.3.3 修复 overall score 标准化时排序错误的问题
|
11
|
+
# 2023.11.05. v2.3.4 修复判断单边溢出信号时的列名选取错误
|
12
|
+
# 2023.11.13. v2.3.5 微调 track score
|
13
|
+
# 2023.12.05. v2.3.6 candidates 增加 cleavage site,修正 alignment 有 deletion 会错位的 bug
|
14
|
+
# 2023.12.05. v2.3.7 用 cleavage site 代替 midpoint # 还没改完
|
15
|
+
# 2023.12.07. v2.3.8 df_score 增加 df_exp, df_ctr 各自列。修复没 df_ctr 时的 bug。track score 用 proximal
|
16
|
+
# 2023.12.09. v2.4.0 为了兼顾 proximal 和 overall,当 normalized overall signal 高于 2 时,增加 overall signal 的加分
|
17
|
+
# 2023.12.09. v2.5.0 尝试新的加权位置
|
18
|
+
# 2023.12.10. v2.6.0 加入 trackseq v4 的计算分支,即考虑 Region 内的 positive_pct,避免短而尖锐的信号
|
19
|
+
# 2023.12.10. v2.6.1 有些非特异信号数值很大,如果在 control 组是大负数,可能导致减 control 后假高信号,因此给负数一个 clip
|
20
|
+
# 2023.12.30. v2.7.0 增加 X_offplot 模块,用于绘图
|
21
|
+
# 2023.12.31. v2.7.1 control 的负数值 clip 由 -5 改为 -1,进一步减少假阳性。另外不加 overall 了
|
22
|
+
# 2024.01.01. v2.7.2 权重改为 proximal + pct = 1 + 1. 防信号外溢假阳性标准由<0改为<=0
|
23
|
+
# 2024.01.02. v2.7.3 flank regions 默认值改为 1000 2000 3000 5000。之前 control 的负数值 clip 相当于直接在 final score,现在改为每个单独 clip 后重新算 score,默认值为 CtrClip=-0.5
|
24
|
+
# 2024.01.03. v2.7.4 更新了 blacklist.bed
|
25
|
+
# 2024.01.04. v2.7.5 更新了 hg38 blacklist.bed
|
26
|
+
# 2024.01.12. v2.7.6 修复小bug,输出 fdr 改为 <0.05。
|
27
|
+
# 2024.01.23. v2.7.7 Snakefile_offtracker: add --fixedStep to bigwigCompare for not merging neighbouring bins with equal values.
|
28
|
+
# 2024.02.01. v2.7.8 逐步添加 X_offplot.py 功能
|
29
|
+
# 2024.06.02. v2.7.9 添加 offtracker_plot.py
|
30
|
+
# 2024.06.03. v2.7.10 修复 bugs,offtable 添加 threshold = 2 的分界
|
@@ -0,0 +1,189 @@
|
|
1
|
+
Metadata-Version: 2.1
|
2
|
+
Name: offtracker
|
3
|
+
Version: 2.7.10
|
4
|
+
Summary: Tracking-seq data analysis
|
5
|
+
Home-page: https://github.com/Lan-lab/offtracker
|
6
|
+
Author: Runda Xu
|
7
|
+
Author-email: runda.xu@foxmail.com
|
8
|
+
Requires-Python: >=3.6.0
|
9
|
+
Description-Content-Type: text/markdown
|
10
|
+
License-File: LICENSE.txt
|
11
|
+
|
12
|
+
|
13
|
+
# OFF-TRACKER
|
14
|
+
|
15
|
+
OFF-TRACKER is an end to end pipeline of Tracking-seq data analysis for detecting off-target sites of any genome editing tools that generate double-strand breaks (DSBs) or single-strand breaks (SSBs).
|
16
|
+
|
17
|
+
## System requirements
|
18
|
+
|
19
|
+
* Linux/Unix
|
20
|
+
* Python >= 3.6
|
21
|
+
|
22
|
+
## Dependency
|
23
|
+
|
24
|
+
```bash
|
25
|
+
# We recommend creating a new enviroment using mamba/conda to avoid compatibility problems
|
26
|
+
# If you don't use mamba, just replace the code with conda
|
27
|
+
mamba create -n offtracker -c bioconda blast snakemake pybedtools
|
28
|
+
```
|
29
|
+
|
30
|
+
|
31
|
+
## Installation
|
32
|
+
|
33
|
+
```bash
|
34
|
+
# Activate the environment
|
35
|
+
conda activate offtracker
|
36
|
+
|
37
|
+
# Direct installation with pip
|
38
|
+
pip install offtracker
|
39
|
+
|
40
|
+
# (Alternative) Download the offtracker from github
|
41
|
+
git clone https://github.com/Lan-lab/offtracker.git
|
42
|
+
cd offtracker
|
43
|
+
pip install .
|
44
|
+
```
|
45
|
+
|
46
|
+
|
47
|
+
## Before analyzing samples
|
48
|
+
|
49
|
+
```bash
|
50
|
+
# Build blast index (only need once for each genome)
|
51
|
+
makeblastdb -input_type fasta -title hg38 -dbtype nucl -parse_seqids \
|
52
|
+
-in /Your_Path_To_Reference/hg38_genome.fa \
|
53
|
+
-out /Your_Path_To_Reference/hg38_genome.blastdb \
|
54
|
+
-logfile /Your_Path_To_Reference/hg38_genome.blastdb.log
|
55
|
+
|
56
|
+
# Build chromap index (only need once for each genome)
|
57
|
+
chromap -i -r /Your_Path_To_Reference/hg38_genome.fa \
|
58
|
+
-o /Your_Path_To_Reference/hg38_genome.chromap.index
|
59
|
+
|
60
|
+
# Generate candidate regions by sgRNA sequence (need once for each genome and sgRNA)
|
61
|
+
# --name: the name of the sgRNA, which will be used in the following analysis
|
62
|
+
offtracker_candidates.py -t 8 -g hg38 \
|
63
|
+
-r /Your_Path_To_Reference/hg38_genome.fa \
|
64
|
+
-b /Your_Path_To_Reference/hg38_genome.blastdb \
|
65
|
+
--name 'VEGFA2' --sgrna 'GACCCCCTCCACCCCGCCTC' --pam 'NGG' \
|
66
|
+
-o /Your_Path_To_Candidates
|
67
|
+
|
68
|
+
```
|
69
|
+
|
70
|
+
## Strand-specific mapping of Tracking-seq data
|
71
|
+
|
72
|
+
```bash
|
73
|
+
# Generate snakemake config file
|
74
|
+
# --subfolder: If different samples are in seperate folders, set this to 1
|
75
|
+
# if -o is not set, the output will be in the same folder as the fastq files
|
76
|
+
offtracker_config.py -t 8 -g hg38 --blacklist hg38 \
|
77
|
+
-r /Your_Path_To_Reference/hg38_genome.fa \
|
78
|
+
-i /Your_Path_To_Reference/hg38_genome.chromap.index \
|
79
|
+
-f /Your_Path_To_Fastq \
|
80
|
+
-o /Your_Path_To_Output \
|
81
|
+
--subfolder 0
|
82
|
+
|
83
|
+
# Run the snakemake program
|
84
|
+
cd /Your_Path_To_Fastq
|
85
|
+
snakemake -np # dry run
|
86
|
+
nohup snakemake --cores 16 1>snakemake.log 2>snakemake.err &
|
87
|
+
|
88
|
+
## about cores
|
89
|
+
# --cores of snakemake must be larger than -t of offtracker_config.py
|
90
|
+
# parallel number = cores/t
|
91
|
+
|
92
|
+
## about output
|
93
|
+
# This part will generate "*.fw.scaled.bw" and ".rv.scaled.bw" for IGV visualization
|
94
|
+
# "*.fw.bed" and "*.rv.bed" are used in the next part.
|
95
|
+
```
|
96
|
+
|
97
|
+
|
98
|
+
## Analyzing the genome-wide off-target sites
|
99
|
+
|
100
|
+
```bash
|
101
|
+
# In this part, multiple samples in the same condition can be analyzed in a single run by pattern recogonization of sample names
|
102
|
+
|
103
|
+
offtracker_analysis.py -g hg38 --name "VEGFA2" \
|
104
|
+
--exp 'Cas9_VEGFA2' \
|
105
|
+
--control 'WT' \
|
106
|
+
--outname 'Cas9_VEGFA_293' \
|
107
|
+
-f /Your_Path_To_Output \
|
108
|
+
--seqfolder /Your_Path_To_Candidates
|
109
|
+
|
110
|
+
# --name: the same gRNA name you set when running offtracker_candidates.py
|
111
|
+
# --exp/--control: add one or multiple patterns of file name in regular expressions
|
112
|
+
# If multiple samples meet the pattern, their signals will be averaged. Thus, only samples with the same condition should be included in a single analysis.
|
113
|
+
|
114
|
+
# This step will generate Offtracker_result_{outname}.csv
|
115
|
+
# Default FDR is 0.05, which can be changed by --fdr. This will empirically make the threshold of Track score around 2.
|
116
|
+
# Sites with Track score >=2, which is a empirical threshold, are output regardless of FDR.
|
117
|
+
# Intermediate files are saved in ./temp folder, which can be deleted.
|
118
|
+
# Keeping the intermediate files can make the analysis faster if involving previously analyzed samples (e.g. using the same control samples for different analyses)
|
119
|
+
```
|
120
|
+
|
121
|
+
## Off-target sequences visualization
|
122
|
+
|
123
|
+
```bash
|
124
|
+
# After get the Offtracker_result_{outname}.csv, you can visualize the off-target sites with their genomic sequence with the following command:
|
125
|
+
|
126
|
+
offtracker_plot.py --result Your_Offtracker_Result_CSV \
|
127
|
+
--sgrna 'GACCCCCTCCACCCCGCCTC' --pam 'NGG'
|
128
|
+
|
129
|
+
# The default output is a pdf file with Offtracker_result_{outname}.pdf
|
130
|
+
# Change the suffix of the output file to change the format (e.g.: .png)
|
131
|
+
# The orange dash line indicates the empirical threshold of Track score = 2
|
132
|
+
# Empirically, the off-target sites with Track score < 2 are less likely to be real off-target sites.
|
133
|
+
```
|
134
|
+
|
135
|
+
|
136
|
+
## Note1
|
137
|
+
|
138
|
+
The default setting only includes chr1-chr22, chrX, chrY, and chrM. Please make sure the reference genome contains "chr" at the beginning.
|
139
|
+
|
140
|
+
Currently, this software is only ready-to-use for mm10 and hg38. For any other genome, e.g., hg19, please add genome size file named "hg19.chrom.sizes" to .\offtracker\mapping and instal manually. Besides, add "--blacklist none" or "--blacklist Your_Blacklist" (e.g., ENCODE blacklist) when running offtracker_config.py, because we only provide blacklists for mm10 and hg38.
|
141
|
+
|
142
|
+
If you have a requirement for species other than human/mouse, please post an issue.
|
143
|
+
|
144
|
+
## Note2
|
145
|
+
|
146
|
+
The FDRs in the Tracking-seq result do not reflect the real off-target probability.
|
147
|
+
It is strongly recommended to observe the "fw.scaled.bw" and "rv.scaled.bw" using genome browser like IGV to visually inspect each target location from the Tracking-seq result.
|
148
|
+
|
149
|
+
|
150
|
+
|
151
|
+
# Example Data
|
152
|
+
|
153
|
+
Here are example data that contains reads of chr6 from HEK293T cells edited with Cas9 + sgRNA VEGFA2 and wild type cells:
|
154
|
+
|
155
|
+
https://figshare.com/articles/dataset/WT_HEK239T_chr6/25956034
|
156
|
+
|
157
|
+
It takes about 5-10 minutes to run the mapping (offtracker_config.py & snakemake) of example data with -t 8 and --cores 16 (2 parallel tasks)
|
158
|
+
|
159
|
+
## Signal visualization
|
160
|
+
|
161
|
+
After mapping, there will be 4 .bw files in the output folder:
|
162
|
+
```bash
|
163
|
+
Cas9_VEGFA2_chr6.fw.scaled.bw
|
164
|
+
|
165
|
+
Cas9_VEGFA2_chr6.rv.scaled.bw
|
166
|
+
|
167
|
+
WT_chr6.fw.scaled.bw
|
168
|
+
|
169
|
+
WT_chr6.rv.scaled.bw
|
170
|
+
```
|
171
|
+
These files can be visualized in genome browser like IGV:
|
172
|
+
|
173
|
+

|
174
|
+
|
175
|
+
|
176
|
+
## Whole genome off-target analysis
|
177
|
+
|
178
|
+
For analyzing the signals (offtracker_analysis.py), it takes about 3-5 minutes and outputs a file named "Offtracker_result_{outname}.csv"
|
179
|
+
|
180
|
+
After that, you can visualize the off-target sites with their genomic sequence (offtracker_plot.py) and get an image like this:
|
181
|
+
|
182
|
+

|
183
|
+
|
184
|
+
# Citation
|
185
|
+
|
186
|
+
|
187
|
+
|
188
|
+
|
189
|
+
|
@@ -22,4 +22,5 @@ offtracker/mapping/offtracker_blacklist_hg38.merged.bed
|
|
22
22
|
offtracker/mapping/offtracker_blacklist_mm10.merged.bed
|
23
23
|
scripts/offtracker_analysis.py
|
24
24
|
scripts/offtracker_candidates.py
|
25
|
-
scripts/offtracker_config.py
|
25
|
+
scripts/offtracker_config.py
|
26
|
+
scripts/offtracker_plot.py
|
@@ -26,6 +26,7 @@ def main():
|
|
26
26
|
parser.add_argument('--name' , type=str, required=True, help='custom name of the sgRNA' )
|
27
27
|
parser.add_argument('--exp' , type=str, default='all', nargs='+', help='A substring mark in the name of experimental samples. The default is to use all samples other than control' )
|
28
28
|
parser.add_argument('--control' , type=str, default='none', nargs='+', help='A substring mark in the name of control samples. The default is no control. "others" for all samples other than --exp.' )
|
29
|
+
parser.add_argument('--fdr' , type=int, default=0.05, help='FDR threshold for the final result. Default is 0.05.')
|
29
30
|
parser.add_argument('--smooth' , type=int, default=1, help='Smooth strength for the signal.')
|
30
31
|
parser.add_argument('--window' , type=int, default=3, help='Window size for smoothing the signal.')
|
31
32
|
parser.add_argument('--binsize' , type=int, default=100, help='Window size for smoothing the signal.')
|
@@ -49,6 +50,7 @@ def main():
|
|
49
50
|
sgRNA_name = args.name
|
50
51
|
pattern_exp = args.exp
|
51
52
|
pattern_ctr = args.control
|
53
|
+
fdr_thresh = args.fdr
|
52
54
|
binsize = args.binsize
|
53
55
|
flank_max = args.flank_max
|
54
56
|
flank_regions = args.flank_regions
|
@@ -155,8 +157,11 @@ def main():
|
|
155
157
|
df_bdg.columns = ['chr','start','end','residual']
|
156
158
|
# 将 df_bdg 按照染色体分组
|
157
159
|
sample_groups = df_bdg.groupby('chr')
|
160
|
+
# 2024.06.03. fix a bug that df_bdg has less chr than df_candidate
|
161
|
+
total_chr = df_bdg['chr'].unique()
|
162
|
+
df_candidate_sub_temp = df_candidate_sub[df_candidate_sub['chr'].isin(total_chr)]
|
158
163
|
# 将 df_candidate_sub 按照染色体分组
|
159
|
-
candidate_groups =
|
164
|
+
candidate_groups = df_candidate_sub_temp.groupby('chr')
|
160
165
|
|
161
166
|
# 定义一个空的列表,用于存储每个染色体的数据
|
162
167
|
chrom_list = []
|
@@ -234,7 +239,8 @@ def main():
|
|
234
239
|
df_score = pd.concat([df_score, df_exp, df_ctr], axis=1)
|
235
240
|
else:
|
236
241
|
df_score = pd.concat([df_score, df_exp], axis=1)
|
237
|
-
|
242
|
+
# 2024.06.03. 跑样例数据时,只有一个 chr6, 其他都是 nan, 不删除会导致后续计算出错
|
243
|
+
df_score = df_score.dropna().copy()
|
238
244
|
df_score.to_csv(output)
|
239
245
|
|
240
246
|
##########################
|
@@ -339,8 +345,10 @@ def main():
|
|
339
345
|
df_result['fdr'] = offtracker.fdr(df_result['pv'])
|
340
346
|
df_result['rank'] = range(1,len(df_result)+1)
|
341
347
|
df_result.to_csv(output)
|
342
|
-
|
343
|
-
|
348
|
+
# 2024.06.03. 以防 fdr<=fdr_thresh 滤掉了 track_score>=2 的位点
|
349
|
+
bool_fdr = df_result['fdr']<=fdr_thresh
|
350
|
+
bool_score = df_result['track_score']>=2
|
351
|
+
df_output = df_result[bool_fdr|bool_score].copy()
|
344
352
|
if pattern_ctr != 'none':
|
345
353
|
df_output = df_output[['target_location', 'best_strand','best_target','deletion','insertion','mismatch',
|
346
354
|
'exp_L_length', 'exp_R_length','ctr_L_length','ctr_R_length','L_length','R_length','signal_length',
|
@@ -0,0 +1,39 @@
|
|
1
|
+
#!/usr/bin/env python
|
2
|
+
# -*- coding: utf-8 -*-
|
3
|
+
|
4
|
+
import offtracker.X_offplot as xoffplot
|
5
|
+
import pandas as pd
|
6
|
+
import argparse
|
7
|
+
import os
|
8
|
+
|
9
|
+
def main():
|
10
|
+
parser = argparse.ArgumentParser()
|
11
|
+
parser.description='Draw the plot of the off-targets with genomic sequences.\nIf .pdf file is too large, try to use .png file instead.'
|
12
|
+
parser.add_argument('--result' , type=str, required=True, help='The file of Offtracker_result_{outname}.csv' )
|
13
|
+
parser.add_argument('--sgrna' , type=str, required=True, help='Not including PAM' )
|
14
|
+
parser.add_argument('--pam' , type=str, default='NGG', help='PAM sequence. Default is "NGG".' )
|
15
|
+
parser.add_argument('--output' , type=str, default='same', help='The output file. Default is Offtracker_result_{outname}.pdf')
|
16
|
+
|
17
|
+
args = parser.parse_args()
|
18
|
+
if args.output == 'same':
|
19
|
+
dir_savefig = args.result.replace('.csv', '.pdf')
|
20
|
+
else:
|
21
|
+
dir_savefig = args.output
|
22
|
+
|
23
|
+
outname = os.path.basename(args.result).replace('Offtracker_result_', '').replace('.csv', '')
|
24
|
+
gRNA = args.sgrna
|
25
|
+
PAM = args.pam
|
26
|
+
full_seq = gRNA + PAM
|
27
|
+
|
28
|
+
df_result = pd.read_csv(args.result)
|
29
|
+
n_pos = len(df_result)
|
30
|
+
|
31
|
+
xoffplot.offtable(df_result, full_seq, length_pam = len(PAM), col_seq='target', threshold=2,
|
32
|
+
title=f'{outname} ({n_pos} sites)',
|
33
|
+
savefig=dir_savefig)
|
34
|
+
|
35
|
+
return f'The plot is saved as {dir_savefig}'
|
36
|
+
|
37
|
+
if __name__ == '__main__' :
|
38
|
+
result = main()
|
39
|
+
print(result)
|
@@ -49,7 +49,10 @@ setup(
|
|
49
49
|
python_requires=REQUIRES_PYTHON,
|
50
50
|
packages=find_packages(),
|
51
51
|
package_data={'offtracker': ['mapping/*']},
|
52
|
-
scripts = ['scripts/offtracker_config.py',
|
52
|
+
scripts = ['scripts/offtracker_config.py',
|
53
|
+
'scripts/offtracker_candidates.py',
|
54
|
+
'scripts/offtracker_analysis.py',
|
55
|
+
'scripts/offtracker_plot.py'],
|
53
56
|
install_requires=REQUIRED,
|
54
57
|
include_package_data=True
|
55
58
|
)
|
offtracker-2.7.8/PKG-INFO
DELETED
@@ -1,146 +0,0 @@
|
|
1
|
-
Metadata-Version: 2.1
|
2
|
-
Name: offtracker
|
3
|
-
Version: 2.7.8
|
4
|
-
Summary: Tracking-seq data analysis
|
5
|
-
Home-page: https://github.com/Lan-lab/offtracker
|
6
|
-
Author: Runda Xu
|
7
|
-
Author-email: runda.xu@foxmail.com
|
8
|
-
Requires-Python: >=3.6.0
|
9
|
-
Description-Content-Type: text/markdown
|
10
|
-
License-File: LICENSE.txt
|
11
|
-
|
12
|
-
|
13
|
-
OFF-TRACKER
|
14
|
-
=======================
|
15
|
-
|
16
|
-
OFF-TRACKER is an end to end pipeline of Tracking-seq data analysis for detecting off-target sites of any genome editing tools that generate double-strand breaks (DSBs) or single-strand breaks (SSBs).
|
17
|
-
|
18
|
-
System requirements
|
19
|
-
-----
|
20
|
-
* Linux/Unix
|
21
|
-
* Python >= 3.6
|
22
|
-
|
23
|
-
Dependency
|
24
|
-
-----
|
25
|
-
|
26
|
-
```bash
|
27
|
-
# We recommend creating a new enviroment using mamba/conda to avoid compatibility problems
|
28
|
-
# If you don't use mamba, just replace the code with conda
|
29
|
-
mamba create -n offtracker -c bioconda blast snakemake pybedtools
|
30
|
-
```
|
31
|
-
|
32
|
-
|
33
|
-
Installation
|
34
|
-
-----
|
35
|
-
|
36
|
-
```bash
|
37
|
-
# activate the environment
|
38
|
-
conda activate offtracker
|
39
|
-
|
40
|
-
# Direct installation with pip
|
41
|
-
pip install offtracker
|
42
|
-
|
43
|
-
# (Alternative) Download the offtracker from github
|
44
|
-
git clone https://github.com/Lan-lab/offtracker.git
|
45
|
-
cd offtracker
|
46
|
-
pip install .
|
47
|
-
```
|
48
|
-
|
49
|
-
|
50
|
-
Before analyzing samples
|
51
|
-
-----
|
52
|
-
|
53
|
-
```bash
|
54
|
-
# Build blast index (only need once for each genome)
|
55
|
-
makeblastdb -input_type fasta -title hg38 -dbtype nucl -parse_seqids \
|
56
|
-
-in /Your_Path_To_Reference/hg38_genome.fa \
|
57
|
-
-out /Your_Path_To_Reference/hg38_genome.blastdb \
|
58
|
-
-logfile /Your_Path_To_Reference/hg38_genome.blastdb.log
|
59
|
-
|
60
|
-
# Build chromap index (only need once for each genome)
|
61
|
-
chromap -i -r /Your_Path_To_Reference/hg38_genome.fa \
|
62
|
-
-o /Your_Path_To_Reference/hg38_genome.chromap.index
|
63
|
-
|
64
|
-
# Generate candidate regions by sgRNA sequence (need once for each genome and sgRNA)
|
65
|
-
offtracker_candidates.py -t 8 -g hg38 \
|
66
|
-
-r /Your_Path_To_Reference/hg38_genome.fa \
|
67
|
-
-b /Your_Path_To_Reference/hg38_genome.blastdb \
|
68
|
-
--name 'HEK4' --sgrna 'GGCACTGCGGCTGGAGGTGG' --pam 'NGG' \
|
69
|
-
-o /Your_Path_To_Candidates
|
70
|
-
|
71
|
-
```
|
72
|
-
|
73
|
-
Strand-specific mapping of Tracking-seq data
|
74
|
-
-----
|
75
|
-
|
76
|
-
```bash
|
77
|
-
# Generate snakemake config file
|
78
|
-
offtracker_config.py -t 8 -g hg38 --blacklist hg38 \
|
79
|
-
-r /Your_Path_To_Reference/hg38_genome.fa \
|
80
|
-
-i /Your_Path_To_Reference/hg38_genome.chromap.index \
|
81
|
-
-f /Your_Path_To_Fastq \
|
82
|
-
-o /Your_Path_To_Output \
|
83
|
-
--subfolder 0
|
84
|
-
|
85
|
-
# --subfolder: If different samples are in seperate folders, set this to 1
|
86
|
-
# -o: Default is outputting to /Your_Path_To_Fastq
|
87
|
-
|
88
|
-
# Run the snakemake program
|
89
|
-
cd /Your_Path_To_Fastq
|
90
|
-
snakemake -np # dry run
|
91
|
-
nohup snakemake --cores 16 1>snakemake.log 2>snakemake.err &
|
92
|
-
|
93
|
-
## about cores
|
94
|
-
# --cores of snakemake must be larger than -t of offtracker_config.py
|
95
|
-
# parallel number = cores/t
|
96
|
-
|
97
|
-
## about output
|
98
|
-
# This part will generate "*.fw.scaled.bw" and ".rv.scaled.bw" for IGV visualization
|
99
|
-
# "*.fw.bed" and "*.rv.bed" are used in the next part.
|
100
|
-
```
|
101
|
-
|
102
|
-
|
103
|
-
Analyzing the off-target sites
|
104
|
-
-----
|
105
|
-
|
106
|
-
```bash
|
107
|
-
# In this part, multiple samples in the same condition can be analyzed in a single run by pattern recogonization of sample names
|
108
|
-
|
109
|
-
offtracker_analysis.py -g hg38 --name "HEK4" \
|
110
|
-
--exp 'Cas9_HEK4.*293' \
|
111
|
-
--control 'control' \
|
112
|
-
--outname 'Cas9_HEK4_293' \
|
113
|
-
-f /Your_Path_To_Output \
|
114
|
-
--seqfolder /Your_Path_To_Candidates
|
115
|
-
|
116
|
-
# --name: the same as that in offtracker_candidates.py
|
117
|
-
# --exp/--control: add one or multiple patterns of file name in regex
|
118
|
-
|
119
|
-
|
120
|
-
# This step will generate Trackseq_result_{outname}.csv
|
121
|
-
# Intermediate files are saved in ./temp folder, which can be deleted
|
122
|
-
# Keeping the intermediate files can make the analysis faster if involving previously analyzed samples (e.g. using the same control samples for different analyses)
|
123
|
-
```
|
124
|
-
|
125
|
-
|
126
|
-
Note1
|
127
|
-
--------------
|
128
|
-
The default setting only includes chr1-chr22, chrX, chrY, and chrM.
|
129
|
-
|
130
|
-
Please make sure the reference genome contains "chr" at the beginning.
|
131
|
-
|
132
|
-
If you have requirement for other chromosomes or species other than human/mouse, please post an issue.
|
133
|
-
|
134
|
-
Note2
|
135
|
-
--------------
|
136
|
-
Currently, this software is only ready-to-use for mm10 and hg38.
|
137
|
-
|
138
|
-
For any other genome, say hg19, please add genome size file named "hg19.chrom.sizes" to .\offtracker\mapping before install.
|
139
|
-
|
140
|
-
Besides, add "--blacklist none" or "--blacklist Your_Blacklist" when running offtracker_config.py
|
141
|
-
|
142
|
-
Note3
|
143
|
-
--------------
|
144
|
-
The FDR in the Tracking-seq result is not rigorous to the real off-target probability.
|
145
|
-
It is strongly recommended to observe the "fw.scaled.bw" and "rv.scaled.bw" using IGV to check each target location from the Tracking-seq result.
|
146
|
-
|
offtracker-2.7.8/README.md
DELETED
@@ -1,134 +0,0 @@
|
|
1
|
-
OFF-TRACKER
|
2
|
-
=======================
|
3
|
-
|
4
|
-
OFF-TRACKER is an end to end pipeline of Tracking-seq data analysis for detecting off-target sites of any genome editing tools that generate double-strand breaks (DSBs) or single-strand breaks (SSBs).
|
5
|
-
|
6
|
-
System requirements
|
7
|
-
-----
|
8
|
-
* Linux/Unix
|
9
|
-
* Python >= 3.6
|
10
|
-
|
11
|
-
Dependency
|
12
|
-
-----
|
13
|
-
|
14
|
-
```bash
|
15
|
-
# We recommend creating a new enviroment using mamba/conda to avoid compatibility problems
|
16
|
-
# If you don't use mamba, just replace the code with conda
|
17
|
-
mamba create -n offtracker -c bioconda blast snakemake pybedtools
|
18
|
-
```
|
19
|
-
|
20
|
-
|
21
|
-
Installation
|
22
|
-
-----
|
23
|
-
|
24
|
-
```bash
|
25
|
-
# activate the environment
|
26
|
-
conda activate offtracker
|
27
|
-
|
28
|
-
# Direct installation with pip
|
29
|
-
pip install offtracker
|
30
|
-
|
31
|
-
# (Alternative) Download the offtracker from github
|
32
|
-
git clone https://github.com/Lan-lab/offtracker.git
|
33
|
-
cd offtracker
|
34
|
-
pip install .
|
35
|
-
```
|
36
|
-
|
37
|
-
|
38
|
-
Before analyzing samples
|
39
|
-
-----
|
40
|
-
|
41
|
-
```bash
|
42
|
-
# Build blast index (only need once for each genome)
|
43
|
-
makeblastdb -input_type fasta -title hg38 -dbtype nucl -parse_seqids \
|
44
|
-
-in /Your_Path_To_Reference/hg38_genome.fa \
|
45
|
-
-out /Your_Path_To_Reference/hg38_genome.blastdb \
|
46
|
-
-logfile /Your_Path_To_Reference/hg38_genome.blastdb.log
|
47
|
-
|
48
|
-
# Build chromap index (only need once for each genome)
|
49
|
-
chromap -i -r /Your_Path_To_Reference/hg38_genome.fa \
|
50
|
-
-o /Your_Path_To_Reference/hg38_genome.chromap.index
|
51
|
-
|
52
|
-
# Generate candidate regions by sgRNA sequence (need once for each genome and sgRNA)
|
53
|
-
offtracker_candidates.py -t 8 -g hg38 \
|
54
|
-
-r /Your_Path_To_Reference/hg38_genome.fa \
|
55
|
-
-b /Your_Path_To_Reference/hg38_genome.blastdb \
|
56
|
-
--name 'HEK4' --sgrna 'GGCACTGCGGCTGGAGGTGG' --pam 'NGG' \
|
57
|
-
-o /Your_Path_To_Candidates
|
58
|
-
|
59
|
-
```
|
60
|
-
|
61
|
-
Strand-specific mapping of Tracking-seq data
|
62
|
-
-----
|
63
|
-
|
64
|
-
```bash
|
65
|
-
# Generate snakemake config file
|
66
|
-
offtracker_config.py -t 8 -g hg38 --blacklist hg38 \
|
67
|
-
-r /Your_Path_To_Reference/hg38_genome.fa \
|
68
|
-
-i /Your_Path_To_Reference/hg38_genome.chromap.index \
|
69
|
-
-f /Your_Path_To_Fastq \
|
70
|
-
-o /Your_Path_To_Output \
|
71
|
-
--subfolder 0
|
72
|
-
|
73
|
-
# --subfolder: If different samples are in seperate folders, set this to 1
|
74
|
-
# -o: Default is outputting to /Your_Path_To_Fastq
|
75
|
-
|
76
|
-
# Run the snakemake program
|
77
|
-
cd /Your_Path_To_Fastq
|
78
|
-
snakemake -np # dry run
|
79
|
-
nohup snakemake --cores 16 1>snakemake.log 2>snakemake.err &
|
80
|
-
|
81
|
-
## about cores
|
82
|
-
# --cores of snakemake must be larger than -t of offtracker_config.py
|
83
|
-
# parallel number = cores/t
|
84
|
-
|
85
|
-
## about output
|
86
|
-
# This part will generate "*.fw.scaled.bw" and ".rv.scaled.bw" for IGV visualization
|
87
|
-
# "*.fw.bed" and "*.rv.bed" are used in the next part.
|
88
|
-
```
|
89
|
-
|
90
|
-
|
91
|
-
Analyzing the off-target sites
|
92
|
-
-----
|
93
|
-
|
94
|
-
```bash
|
95
|
-
# In this part, multiple samples in the same condition can be analyzed in a single run by pattern recogonization of sample names
|
96
|
-
|
97
|
-
offtracker_analysis.py -g hg38 --name "HEK4" \
|
98
|
-
--exp 'Cas9_HEK4.*293' \
|
99
|
-
--control 'control' \
|
100
|
-
--outname 'Cas9_HEK4_293' \
|
101
|
-
-f /Your_Path_To_Output \
|
102
|
-
--seqfolder /Your_Path_To_Candidates
|
103
|
-
|
104
|
-
# --name: the same as that in offtracker_candidates.py
|
105
|
-
# --exp/--control: add one or multiple patterns of file name in regex
|
106
|
-
|
107
|
-
|
108
|
-
# This step will generate Trackseq_result_{outname}.csv
|
109
|
-
# Intermediate files are saved in ./temp folder, which can be deleted
|
110
|
-
# Keeping the intermediate files can make the analysis faster if involving previously analyzed samples (e.g. using the same control samples for different analyses)
|
111
|
-
```
|
112
|
-
|
113
|
-
|
114
|
-
Note1
|
115
|
-
--------------
|
116
|
-
The default setting only includes chr1-chr22, chrX, chrY, and chrM.
|
117
|
-
|
118
|
-
Please make sure the reference genome contains "chr" at the beginning.
|
119
|
-
|
120
|
-
If you have requirement for other chromosomes or species other than human/mouse, please post an issue.
|
121
|
-
|
122
|
-
Note2
|
123
|
-
--------------
|
124
|
-
Currently, this software is only ready-to-use for mm10 and hg38.
|
125
|
-
|
126
|
-
For any other genome, say hg19, please add genome size file named "hg19.chrom.sizes" to .\offtracker\mapping before install.
|
127
|
-
|
128
|
-
Besides, add "--blacklist none" or "--blacklist Your_Blacklist" when running offtracker_config.py
|
129
|
-
|
130
|
-
Note3
|
131
|
-
--------------
|
132
|
-
The FDR in the Tracking-seq result is not rigorous to the real off-target probability.
|
133
|
-
It is strongly recommended to observe the "fw.scaled.bw" and "rv.scaled.bw" using IGV to check each target location from the Tracking-seq result.
|
134
|
-
|
@@ -1,28 +0,0 @@
|
|
1
|
-
__version__ = "2.7.8"
|
2
|
-
# 2023.08.11. v1.1.0 adding a option for not normalizing the bw file
|
3
|
-
# 2023.10.26. v1.9.0 prerelease for v2.0
|
4
|
-
# 2023.10.27. v2.0.0 大更新,还没微调
|
5
|
-
# 2023.10.28. v2.1.0 修复bug,增加计算信号长度的功能
|
6
|
-
# 2023.10.28. v2.2.0 修复bug,改变计算信号长度的算法
|
7
|
-
# 2023.10.29. v2.3.0 增加 overall signal 计算
|
8
|
-
# 2023.11.01. v2.3.1 增加 signal_only 选项
|
9
|
-
# 2023.11.02. v2.3.2 修改 sample signal 和 group mean 的计算顺序
|
10
|
-
# 2023.11.04. v2.3.3 修复 overall score 标准化时排序错误的问题
|
11
|
-
# 2023.11.05. v2.3.4 修复判断单边溢出信号时的列名选取错误
|
12
|
-
# 2023.11.13. v2.3.5 微调 track score
|
13
|
-
# 2023.12.05. v2.3.6 candidates 增加 cleavage site,修正 alignment 有 deletion 会错位的 bug
|
14
|
-
# 2023.12.05. v2.3.7 用 cleavage site 代替 midpoint # 还没改完
|
15
|
-
# 2023.12.07. v2.3.8 df_score 增加 df_exp, df_ctr 各自列。修复没 df_ctr 时的 bug。track score 用 proximal
|
16
|
-
# 2023.12.09. v2.4.0 为了兼顾 proximal 和 overall,当 normalized overall signal 高于 2 时,增加 overall signal 的加分
|
17
|
-
# 2023.12.09. v2.5.0 尝试新的加权位置
|
18
|
-
# 2023.12.10. v2.6.0 加入 trackseq v4 的计算分支,即考虑 Region 内的 positive_pct,避免短而尖锐的信号
|
19
|
-
# 2023.12.10. v2.6.1 有些非特异信号数值很大,如果在 control 组是大负数,可能导致减 control 后假高信号,因此给负数一个 clip
|
20
|
-
# 2023.12.30. v2.7.0 增加 X_offplot 模块,用于绘图
|
21
|
-
# 2023.12.31. v2.7.1 control 的负数值 clip 由 -5 改为 -1,进一步减少假阳性。另外不加 overall 了
|
22
|
-
# 2024.01.01. v2.7.2 权重改为 proximal + pct = 1 + 1. 防信号外溢假阳性标准由<0改为<=0
|
23
|
-
# 2024.01.02. v2.7.3 flank regions 默认值改为 1000 2000 3000 5000。之前 control 的负数值 clip 相当于直接在 final score,现在改为每个单独 clip 后重新算 score,默认值为 CtrClip=-0.5
|
24
|
-
# 2024.01.03. v2.7.4 更新了 blacklist.bed
|
25
|
-
# 2024.01.04. v2.7.5 更新了 hg38 blacklist.bed
|
26
|
-
# 2024.01.12. v2.7.6 修复小bug,输出 fdr 改为 <0.05。
|
27
|
-
# 2024.01.23. v2.7.7 Snakefile_offtracker: add --fixedStep to bigwigCompare for not merging neighbouring bins with equal values.
|
28
|
-
# 2024.02.01. v2.7.8 逐步添加 X_offplot.py 功能
|
@@ -1,146 +0,0 @@
|
|
1
|
-
Metadata-Version: 2.1
|
2
|
-
Name: offtracker
|
3
|
-
Version: 2.7.8
|
4
|
-
Summary: Tracking-seq data analysis
|
5
|
-
Home-page: https://github.com/Lan-lab/offtracker
|
6
|
-
Author: Runda Xu
|
7
|
-
Author-email: runda.xu@foxmail.com
|
8
|
-
Requires-Python: >=3.6.0
|
9
|
-
Description-Content-Type: text/markdown
|
10
|
-
License-File: LICENSE.txt
|
11
|
-
|
12
|
-
|
13
|
-
OFF-TRACKER
|
14
|
-
=======================
|
15
|
-
|
16
|
-
OFF-TRACKER is an end to end pipeline of Tracking-seq data analysis for detecting off-target sites of any genome editing tools that generate double-strand breaks (DSBs) or single-strand breaks (SSBs).
|
17
|
-
|
18
|
-
System requirements
|
19
|
-
-----
|
20
|
-
* Linux/Unix
|
21
|
-
* Python >= 3.6
|
22
|
-
|
23
|
-
Dependency
|
24
|
-
-----
|
25
|
-
|
26
|
-
```bash
|
27
|
-
# We recommend creating a new enviroment using mamba/conda to avoid compatibility problems
|
28
|
-
# If you don't use mamba, just replace the code with conda
|
29
|
-
mamba create -n offtracker -c bioconda blast snakemake pybedtools
|
30
|
-
```
|
31
|
-
|
32
|
-
|
33
|
-
Installation
|
34
|
-
-----
|
35
|
-
|
36
|
-
```bash
|
37
|
-
# activate the environment
|
38
|
-
conda activate offtracker
|
39
|
-
|
40
|
-
# Direct installation with pip
|
41
|
-
pip install offtracker
|
42
|
-
|
43
|
-
# (Alternative) Download the offtracker from github
|
44
|
-
git clone https://github.com/Lan-lab/offtracker.git
|
45
|
-
cd offtracker
|
46
|
-
pip install .
|
47
|
-
```
|
48
|
-
|
49
|
-
|
50
|
-
Before analyzing samples
|
51
|
-
-----
|
52
|
-
|
53
|
-
```bash
|
54
|
-
# Build blast index (only need once for each genome)
|
55
|
-
makeblastdb -input_type fasta -title hg38 -dbtype nucl -parse_seqids \
|
56
|
-
-in /Your_Path_To_Reference/hg38_genome.fa \
|
57
|
-
-out /Your_Path_To_Reference/hg38_genome.blastdb \
|
58
|
-
-logfile /Your_Path_To_Reference/hg38_genome.blastdb.log
|
59
|
-
|
60
|
-
# Build chromap index (only need once for each genome)
|
61
|
-
chromap -i -r /Your_Path_To_Reference/hg38_genome.fa \
|
62
|
-
-o /Your_Path_To_Reference/hg38_genome.chromap.index
|
63
|
-
|
64
|
-
# Generate candidate regions by sgRNA sequence (need once for each genome and sgRNA)
|
65
|
-
offtracker_candidates.py -t 8 -g hg38 \
|
66
|
-
-r /Your_Path_To_Reference/hg38_genome.fa \
|
67
|
-
-b /Your_Path_To_Reference/hg38_genome.blastdb \
|
68
|
-
--name 'HEK4' --sgrna 'GGCACTGCGGCTGGAGGTGG' --pam 'NGG' \
|
69
|
-
-o /Your_Path_To_Candidates
|
70
|
-
|
71
|
-
```
|
72
|
-
|
73
|
-
Strand-specific mapping of Tracking-seq data
|
74
|
-
-----
|
75
|
-
|
76
|
-
```bash
|
77
|
-
# Generate snakemake config file
|
78
|
-
offtracker_config.py -t 8 -g hg38 --blacklist hg38 \
|
79
|
-
-r /Your_Path_To_Reference/hg38_genome.fa \
|
80
|
-
-i /Your_Path_To_Reference/hg38_genome.chromap.index \
|
81
|
-
-f /Your_Path_To_Fastq \
|
82
|
-
-o /Your_Path_To_Output \
|
83
|
-
--subfolder 0
|
84
|
-
|
85
|
-
# --subfolder: If different samples are in seperate folders, set this to 1
|
86
|
-
# -o: Default is outputting to /Your_Path_To_Fastq
|
87
|
-
|
88
|
-
# Run the snakemake program
|
89
|
-
cd /Your_Path_To_Fastq
|
90
|
-
snakemake -np # dry run
|
91
|
-
nohup snakemake --cores 16 1>snakemake.log 2>snakemake.err &
|
92
|
-
|
93
|
-
## about cores
|
94
|
-
# --cores of snakemake must be larger than -t of offtracker_config.py
|
95
|
-
# parallel number = cores/t
|
96
|
-
|
97
|
-
## about output
|
98
|
-
# This part will generate "*.fw.scaled.bw" and ".rv.scaled.bw" for IGV visualization
|
99
|
-
# "*.fw.bed" and "*.rv.bed" are used in the next part.
|
100
|
-
```
|
101
|
-
|
102
|
-
|
103
|
-
Analyzing the off-target sites
|
104
|
-
-----
|
105
|
-
|
106
|
-
```bash
|
107
|
-
# In this part, multiple samples in the same condition can be analyzed in a single run by pattern recogonization of sample names
|
108
|
-
|
109
|
-
offtracker_analysis.py -g hg38 --name "HEK4" \
|
110
|
-
--exp 'Cas9_HEK4.*293' \
|
111
|
-
--control 'control' \
|
112
|
-
--outname 'Cas9_HEK4_293' \
|
113
|
-
-f /Your_Path_To_Output \
|
114
|
-
--seqfolder /Your_Path_To_Candidates
|
115
|
-
|
116
|
-
# --name: the same as that in offtracker_candidates.py
|
117
|
-
# --exp/--control: add one or multiple patterns of file name in regex
|
118
|
-
|
119
|
-
|
120
|
-
# This step will generate Trackseq_result_{outname}.csv
|
121
|
-
# Intermediate files are saved in ./temp folder, which can be deleted
|
122
|
-
# Keeping the intermediate files can make the analysis faster if involving previously analyzed samples (e.g. using the same control samples for different analyses)
|
123
|
-
```
|
124
|
-
|
125
|
-
|
126
|
-
Note1
|
127
|
-
--------------
|
128
|
-
The default setting only includes chr1-chr22, chrX, chrY, and chrM.
|
129
|
-
|
130
|
-
Please make sure the reference genome contains "chr" at the beginning.
|
131
|
-
|
132
|
-
If you have requirement for other chromosomes or species other than human/mouse, please post an issue.
|
133
|
-
|
134
|
-
Note2
|
135
|
-
--------------
|
136
|
-
Currently, this software is only ready-to-use for mm10 and hg38.
|
137
|
-
|
138
|
-
For any other genome, say hg19, please add genome size file named "hg19.chrom.sizes" to .\offtracker\mapping before install.
|
139
|
-
|
140
|
-
Besides, add "--blacklist none" or "--blacklist Your_Blacklist" when running offtracker_config.py
|
141
|
-
|
142
|
-
Note3
|
143
|
-
--------------
|
144
|
-
The FDR in the Tracking-seq result is not rigorous to the real off-target probability.
|
145
|
-
It is strongly recommended to observe the "fw.scaled.bw" and "rv.scaled.bw" using IGV to check each target location from the Tracking-seq result.
|
146
|
-
|
File without changes
|
File without changes
|
File without changes
|
File without changes
|
File without changes
|
File without changes
|
File without changes
|
File without changes
|
File without changes
|
File without changes
|
File without changes
|
{offtracker-2.7.8 → offtracker-2.7.10}/offtracker/mapping/offtracker_blacklist_hg38.merged.bed
RENAMED
File without changes
|
{offtracker-2.7.8 → offtracker-2.7.10}/offtracker/mapping/offtracker_blacklist_mm10.merged.bed
RENAMED
File without changes
|
File without changes
|
File without changes
|
File without changes
|
File without changes
|
File without changes
|
File without changes
|