offtracker 2.7.7__zip → 2.7.10__zip

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (33) hide show
  1. offtracker-2.7.10/PKG-INFO +189 -0
  2. offtracker-2.7.10/README.md +177 -0
  3. offtracker-2.7.10/offtracker/X_offplot.py +539 -0
  4. {offtracker-2.7.7 → offtracker-2.7.10}/offtracker/X_offtracker.py +2 -1
  5. offtracker-2.7.10/offtracker/_version.py +30 -0
  6. offtracker-2.7.10/offtracker.egg-info/PKG-INFO +189 -0
  7. {offtracker-2.7.7 → offtracker-2.7.10}/offtracker.egg-info/SOURCES.txt +2 -1
  8. {offtracker-2.7.7 → offtracker-2.7.10}/scripts/offtracker_analysis.py +18 -9
  9. offtracker-2.7.10/scripts/offtracker_plot.py +39 -0
  10. {offtracker-2.7.7 → offtracker-2.7.10}/setup.py +5 -2
  11. offtracker-2.7.7/PKG-INFO +0 -146
  12. offtracker-2.7.7/README.md +0 -134
  13. offtracker-2.7.7/offtracker/X_offplot.py +0 -123
  14. offtracker-2.7.7/offtracker/_version.py +0 -27
  15. offtracker-2.7.7/offtracker.egg-info/PKG-INFO +0 -146
  16. {offtracker-2.7.7 → offtracker-2.7.10}/LICENSE.txt +0 -0
  17. {offtracker-2.7.7 → offtracker-2.7.10}/MANIFEST.in +0 -0
  18. {offtracker-2.7.7 → offtracker-2.7.10}/offtracker/X_sequence.py +0 -0
  19. {offtracker-2.7.7 → offtracker-2.7.10}/offtracker/__init__.py +0 -0
  20. {offtracker-2.7.7 → offtracker-2.7.10}/offtracker/mapping/1.1_bed2fr_v4.5.py +0 -0
  21. {offtracker-2.7.7 → offtracker-2.7.10}/offtracker/mapping/1.3_bdg_normalize_v4.0.py +0 -0
  22. {offtracker-2.7.7 → offtracker-2.7.10}/offtracker/mapping/Snakefile_offtracker +0 -0
  23. {offtracker-2.7.7 → offtracker-2.7.10}/offtracker/mapping/bedGraphToBigWig +0 -0
  24. {offtracker-2.7.7 → offtracker-2.7.10}/offtracker/mapping/hg38.chrom.sizes +0 -0
  25. {offtracker-2.7.7 → offtracker-2.7.10}/offtracker/mapping/mm10.chrom.sizes +0 -0
  26. {offtracker-2.7.7 → offtracker-2.7.10}/offtracker/mapping/offtracker_blacklist_hg38.merged.bed +0 -0
  27. {offtracker-2.7.7 → offtracker-2.7.10}/offtracker/mapping/offtracker_blacklist_mm10.merged.bed +0 -0
  28. {offtracker-2.7.7 → offtracker-2.7.10}/offtracker.egg-info/dependency_links.txt +0 -0
  29. {offtracker-2.7.7 → offtracker-2.7.10}/offtracker.egg-info/requires.txt +0 -0
  30. {offtracker-2.7.7 → offtracker-2.7.10}/offtracker.egg-info/top_level.txt +0 -0
  31. {offtracker-2.7.7 → offtracker-2.7.10}/scripts/offtracker_candidates.py +0 -0
  32. {offtracker-2.7.7 → offtracker-2.7.10}/scripts/offtracker_config.py +0 -0
  33. {offtracker-2.7.7 → offtracker-2.7.10}/setup.cfg +0 -0
@@ -0,0 +1,189 @@
1
+ Metadata-Version: 2.1
2
+ Name: offtracker
3
+ Version: 2.7.10
4
+ Summary: Tracking-seq data analysis
5
+ Home-page: https://github.com/Lan-lab/offtracker
6
+ Author: Runda Xu
7
+ Author-email: runda.xu@foxmail.com
8
+ Requires-Python: >=3.6.0
9
+ Description-Content-Type: text/markdown
10
+ License-File: LICENSE.txt
11
+
12
+
13
+ # OFF-TRACKER
14
+
15
+ OFF-TRACKER is an end to end pipeline of Tracking-seq data analysis for detecting off-target sites of any genome editing tools that generate double-strand breaks (DSBs) or single-strand breaks (SSBs).
16
+
17
+ ## System requirements
18
+
19
+ * Linux/Unix
20
+ * Python >= 3.6
21
+
22
+ ## Dependency
23
+
24
+ ```bash
25
+ # We recommend creating a new enviroment using mamba/conda to avoid compatibility problems
26
+ # If you don't use mamba, just replace the code with conda
27
+ mamba create -n offtracker -c bioconda blast snakemake pybedtools
28
+ ```
29
+
30
+
31
+ ## Installation
32
+
33
+ ```bash
34
+ # Activate the environment
35
+ conda activate offtracker
36
+
37
+ # Direct installation with pip
38
+ pip install offtracker
39
+
40
+ # (Alternative) Download the offtracker from github
41
+ git clone https://github.com/Lan-lab/offtracker.git
42
+ cd offtracker
43
+ pip install .
44
+ ```
45
+
46
+
47
+ ## Before analyzing samples
48
+
49
+ ```bash
50
+ # Build blast index (only need once for each genome)
51
+ makeblastdb -input_type fasta -title hg38 -dbtype nucl -parse_seqids \
52
+ -in /Your_Path_To_Reference/hg38_genome.fa \
53
+ -out /Your_Path_To_Reference/hg38_genome.blastdb \
54
+ -logfile /Your_Path_To_Reference/hg38_genome.blastdb.log
55
+
56
+ # Build chromap index (only need once for each genome)
57
+ chromap -i -r /Your_Path_To_Reference/hg38_genome.fa \
58
+ -o /Your_Path_To_Reference/hg38_genome.chromap.index
59
+
60
+ # Generate candidate regions by sgRNA sequence (need once for each genome and sgRNA)
61
+ # --name: the name of the sgRNA, which will be used in the following analysis
62
+ offtracker_candidates.py -t 8 -g hg38 \
63
+ -r /Your_Path_To_Reference/hg38_genome.fa \
64
+ -b /Your_Path_To_Reference/hg38_genome.blastdb \
65
+ --name 'VEGFA2' --sgrna 'GACCCCCTCCACCCCGCCTC' --pam 'NGG' \
66
+ -o /Your_Path_To_Candidates
67
+
68
+ ```
69
+
70
+ ## Strand-specific mapping of Tracking-seq data
71
+
72
+ ```bash
73
+ # Generate snakemake config file
74
+ # --subfolder: If different samples are in seperate folders, set this to 1
75
+ # if -o is not set, the output will be in the same folder as the fastq files
76
+ offtracker_config.py -t 8 -g hg38 --blacklist hg38 \
77
+ -r /Your_Path_To_Reference/hg38_genome.fa \
78
+ -i /Your_Path_To_Reference/hg38_genome.chromap.index \
79
+ -f /Your_Path_To_Fastq \
80
+ -o /Your_Path_To_Output \
81
+ --subfolder 0
82
+
83
+ # Run the snakemake program
84
+ cd /Your_Path_To_Fastq
85
+ snakemake -np # dry run
86
+ nohup snakemake --cores 16 1>snakemake.log 2>snakemake.err &
87
+
88
+ ## about cores
89
+ # --cores of snakemake must be larger than -t of offtracker_config.py
90
+ # parallel number = cores/t
91
+
92
+ ## about output
93
+ # This part will generate "*.fw.scaled.bw" and ".rv.scaled.bw" for IGV visualization
94
+ # "*.fw.bed" and "*.rv.bed" are used in the next part.
95
+ ```
96
+
97
+
98
+ ## Analyzing the genome-wide off-target sites
99
+
100
+ ```bash
101
+ # In this part, multiple samples in the same condition can be analyzed in a single run by pattern recogonization of sample names
102
+
103
+ offtracker_analysis.py -g hg38 --name "VEGFA2" \
104
+ --exp 'Cas9_VEGFA2' \
105
+ --control 'WT' \
106
+ --outname 'Cas9_VEGFA_293' \
107
+ -f /Your_Path_To_Output \
108
+ --seqfolder /Your_Path_To_Candidates
109
+
110
+ # --name: the same gRNA name you set when running offtracker_candidates.py
111
+ # --exp/--control: add one or multiple patterns of file name in regular expressions
112
+ # If multiple samples meet the pattern, their signals will be averaged. Thus, only samples with the same condition should be included in a single analysis.
113
+
114
+ # This step will generate Offtracker_result_{outname}.csv
115
+ # Default FDR is 0.05, which can be changed by --fdr. This will empirically make the threshold of Track score around 2.
116
+ # Sites with Track score >=2, which is a empirical threshold, are output regardless of FDR.
117
+ # Intermediate files are saved in ./temp folder, which can be deleted.
118
+ # Keeping the intermediate files can make the analysis faster if involving previously analyzed samples (e.g. using the same control samples for different analyses)
119
+ ```
120
+
121
+ ## Off-target sequences visualization
122
+
123
+ ```bash
124
+ # After get the Offtracker_result_{outname}.csv, you can visualize the off-target sites with their genomic sequence with the following command:
125
+
126
+ offtracker_plot.py --result Your_Offtracker_Result_CSV \
127
+ --sgrna 'GACCCCCTCCACCCCGCCTC' --pam 'NGG'
128
+
129
+ # The default output is a pdf file with Offtracker_result_{outname}.pdf
130
+ # Change the suffix of the output file to change the format (e.g.: .png)
131
+ # The orange dash line indicates the empirical threshold of Track score = 2
132
+ # Empirically, the off-target sites with Track score < 2 are less likely to be real off-target sites.
133
+ ```
134
+
135
+
136
+ ## Note1
137
+
138
+ The default setting only includes chr1-chr22, chrX, chrY, and chrM. Please make sure the reference genome contains "chr" at the beginning.
139
+
140
+ Currently, this software is only ready-to-use for mm10 and hg38. For any other genome, e.g., hg19, please add genome size file named "hg19.chrom.sizes" to .\offtracker\mapping and instal manually. Besides, add "--blacklist none" or "--blacklist Your_Blacklist" (e.g., ENCODE blacklist) when running offtracker_config.py, because we only provide blacklists for mm10 and hg38.
141
+
142
+ If you have a requirement for species other than human/mouse, please post an issue.
143
+
144
+ ## Note2
145
+
146
+ The FDRs in the Tracking-seq result do not reflect the real off-target probability.
147
+ It is strongly recommended to observe the "fw.scaled.bw" and "rv.scaled.bw" using genome browser like IGV to visually inspect each target location from the Tracking-seq result.
148
+
149
+
150
+
151
+ # Example Data
152
+
153
+ Here are example data that contains reads of chr6 from HEK293T cells edited with Cas9 + sgRNA VEGFA2 and wild type cells:
154
+
155
+ https://figshare.com/articles/dataset/WT_HEK239T_chr6/25956034
156
+
157
+ It takes about 5-10 minutes to run the mapping (offtracker_config.py & snakemake) of example data with -t 8 and --cores 16 (2 parallel tasks)
158
+
159
+ ## Signal visualization
160
+
161
+ After mapping, there will be 4 .bw files in the output folder:
162
+ ```bash
163
+ Cas9_VEGFA2_chr6.fw.scaled.bw
164
+
165
+ Cas9_VEGFA2_chr6.rv.scaled.bw
166
+
167
+ WT_chr6.fw.scaled.bw
168
+
169
+ WT_chr6.rv.scaled.bw
170
+ ```
171
+ These files can be visualized in genome browser like IGV:
172
+
173
+ ![signal](https://github.com/Lan-lab/offtracker/blob/main/example_output/signals_example.png?raw=true)
174
+
175
+
176
+ ## Whole genome off-target analysis
177
+
178
+ For analyzing the signals (offtracker_analysis.py), it takes about 3-5 minutes and outputs a file named "Offtracker_result_{outname}.csv"
179
+
180
+ After that, you can visualize the off-target sites with their genomic sequence (offtracker_plot.py) and get an image like this:
181
+
182
+ ![offtarget](https://github.com/Lan-lab/offtracker/blob/main/example_output/sequences_example.png?raw=true)
183
+
184
+ # Citation
185
+
186
+
187
+
188
+
189
+
@@ -22,4 +22,5 @@ offtracker/mapping/offtracker_blacklist_hg38.merged.bed
22
22
  offtracker/mapping/offtracker_blacklist_mm10.merged.bed
23
23
  scripts/offtracker_analysis.py
24
24
  scripts/offtracker_candidates.py
25
- scripts/offtracker_config.py
25
+ scripts/offtracker_config.py
26
+ scripts/offtracker_plot.py
@@ -26,6 +26,7 @@ def main():
26
26
  parser.add_argument('--name' , type=str, required=True, help='custom name of the sgRNA' )
27
27
  parser.add_argument('--exp' , type=str, default='all', nargs='+', help='A substring mark in the name of experimental samples. The default is to use all samples other than control' )
28
28
  parser.add_argument('--control' , type=str, default='none', nargs='+', help='A substring mark in the name of control samples. The default is no control. "others" for all samples other than --exp.' )
29
+ parser.add_argument('--fdr' , type=int, default=0.05, help='FDR threshold for the final result. Default is 0.05.')
29
30
  parser.add_argument('--smooth' , type=int, default=1, help='Smooth strength for the signal.')
30
31
  parser.add_argument('--window' , type=int, default=3, help='Window size for smoothing the signal.')
31
32
  parser.add_argument('--binsize' , type=int, default=100, help='Window size for smoothing the signal.')
@@ -49,6 +50,7 @@ def main():
49
50
  sgRNA_name = args.name
50
51
  pattern_exp = args.exp
51
52
  pattern_ctr = args.control
53
+ fdr_thresh = args.fdr
52
54
  binsize = args.binsize
53
55
  flank_max = args.flank_max
54
56
  flank_regions = args.flank_regions
@@ -93,7 +95,7 @@ def main():
93
95
  all_sample_files.extend( bdg_files )
94
96
  all_sample_files = pd.Series(all_sample_files)
95
97
  all_sample_names = pd.Series(all_sample_names)
96
-
98
+ print('your string pattern for experimental groups: ', pattern_exp)
97
99
  ctr_samples = []
98
100
  if pattern_ctr == 'none':
99
101
  if pattern_exp == 'all':
@@ -155,8 +157,11 @@ def main():
155
157
  df_bdg.columns = ['chr','start','end','residual']
156
158
  # 将 df_bdg 按照染色体分组
157
159
  sample_groups = df_bdg.groupby('chr')
160
+ # 2024.06.03. fix a bug that df_bdg has less chr than df_candidate
161
+ total_chr = df_bdg['chr'].unique()
162
+ df_candidate_sub_temp = df_candidate_sub[df_candidate_sub['chr'].isin(total_chr)]
158
163
  # 将 df_candidate_sub 按照染色体分组
159
- candidate_groups = df_candidate_sub.groupby('chr')
164
+ candidate_groups = df_candidate_sub_temp.groupby('chr')
160
165
 
161
166
  # 定义一个空的列表,用于存储每个染色体的数据
162
167
  chrom_list = []
@@ -234,7 +239,8 @@ def main():
234
239
  df_score = pd.concat([df_score, df_exp, df_ctr], axis=1)
235
240
  else:
236
241
  df_score = pd.concat([df_score, df_exp], axis=1)
237
- df_score = df_score.copy()
242
+ # 2024.06.03. 跑样例数据时,只有一个 chr6, 其他都是 nan, 不删除会导致后续计算出错
243
+ df_score = df_score.dropna().copy()
238
244
  df_score.to_csv(output)
239
245
 
240
246
  ##########################
@@ -299,12 +305,13 @@ def main():
299
305
 
300
306
  # 单边信号周围有更高分的,去掉
301
307
  # v2.1 后 cols_L, cols_R 要手动
308
+ # 2024.01.26. 只看 1kb 了,但这个办法还是无法解决约 100-500 bp 以内有两个相似位点的问题
302
309
  if pattern_ctr != 'none':
303
- cols_L = ['exp_L_1000', 'exp_L_2000']
304
- cols_R = ['exp_R_1000', 'exp_R_2000']
310
+ cols_L = ['exp_L_1000']
311
+ cols_R = ['exp_R_1000']
305
312
  else:
306
- cols_L = ['L_1000', 'L_2000'] # df_score.columns[df_score.columns.str.contains('^L_\d+')]
307
- cols_R = ['R_1000', 'R_2000'] # df_score.columns[df_score.columns.str.contains('^R_\d+')]
313
+ cols_L = ['L_1000'] # df_score.columns[df_score.columns.str.contains('^L_\d+')]
314
+ cols_R = ['R_1000'] # df_score.columns[df_score.columns.str.contains('^R_\d+')]
308
315
  seq_score_thresh = np.power(1.25, seq_score_power)
309
316
  search_distance = 100000
310
317
  candidate_dup = list(df_result[((df_result[cols_R].max(axis=1)<=0)|(df_result[cols_L].max(axis=1)<=0))&(df_result['log2_track_score']>0.8)].index)
@@ -338,8 +345,10 @@ def main():
338
345
  df_result['fdr'] = offtracker.fdr(df_result['pv'])
339
346
  df_result['rank'] = range(1,len(df_result)+1)
340
347
  df_result.to_csv(output)
341
-
342
- df_output = df_result[df_result['fdr']<=0.05].copy()
348
+ # 2024.06.03. 以防 fdr<=fdr_thresh 滤掉了 track_score>=2 的位点
349
+ bool_fdr = df_result['fdr']<=fdr_thresh
350
+ bool_score = df_result['track_score']>=2
351
+ df_output = df_result[bool_fdr|bool_score].copy()
343
352
  if pattern_ctr != 'none':
344
353
  df_output = df_output[['target_location', 'best_strand','best_target','deletion','insertion','mismatch',
345
354
  'exp_L_length', 'exp_R_length','ctr_L_length','ctr_R_length','L_length','R_length','signal_length',
@@ -0,0 +1,39 @@
1
+ #!/usr/bin/env python
2
+ # -*- coding: utf-8 -*-
3
+
4
+ import offtracker.X_offplot as xoffplot
5
+ import pandas as pd
6
+ import argparse
7
+ import os
8
+
9
+ def main():
10
+ parser = argparse.ArgumentParser()
11
+ parser.description='Draw the plot of the off-targets with genomic sequences.\nIf .pdf file is too large, try to use .png file instead.'
12
+ parser.add_argument('--result' , type=str, required=True, help='The file of Offtracker_result_{outname}.csv' )
13
+ parser.add_argument('--sgrna' , type=str, required=True, help='Not including PAM' )
14
+ parser.add_argument('--pam' , type=str, default='NGG', help='PAM sequence. Default is "NGG".' )
15
+ parser.add_argument('--output' , type=str, default='same', help='The output file. Default is Offtracker_result_{outname}.pdf')
16
+
17
+ args = parser.parse_args()
18
+ if args.output == 'same':
19
+ dir_savefig = args.result.replace('.csv', '.pdf')
20
+ else:
21
+ dir_savefig = args.output
22
+
23
+ outname = os.path.basename(args.result).replace('Offtracker_result_', '').replace('.csv', '')
24
+ gRNA = args.sgrna
25
+ PAM = args.pam
26
+ full_seq = gRNA + PAM
27
+
28
+ df_result = pd.read_csv(args.result)
29
+ n_pos = len(df_result)
30
+
31
+ xoffplot.offtable(df_result, full_seq, length_pam = len(PAM), col_seq='target', threshold=2,
32
+ title=f'{outname} ({n_pos} sites)',
33
+ savefig=dir_savefig)
34
+
35
+ return f'The plot is saved as {dir_savefig}'
36
+
37
+ if __name__ == '__main__' :
38
+ result = main()
39
+ print(result)
@@ -9,7 +9,7 @@ from setuptools import find_packages, setup, Command
9
9
 
10
10
  #
11
11
  NAME = 'offtracker'
12
- DESCRIPTION = 'Track-seq data analysis'
12
+ DESCRIPTION = 'Tracking-seq data analysis'
13
13
  AUTHOR = 'Runda Xu'
14
14
  EMAIL = 'runda.xu@foxmail.com'
15
15
  URL = 'https://github.com/Lan-lab/offtracker'
@@ -49,7 +49,10 @@ setup(
49
49
  python_requires=REQUIRES_PYTHON,
50
50
  packages=find_packages(),
51
51
  package_data={'offtracker': ['mapping/*']},
52
- scripts = ['scripts/offtracker_config.py','scripts/offtracker_candidates.py','scripts/offtracker_analysis.py'],
52
+ scripts = ['scripts/offtracker_config.py',
53
+ 'scripts/offtracker_candidates.py',
54
+ 'scripts/offtracker_analysis.py',
55
+ 'scripts/offtracker_plot.py'],
53
56
  install_requires=REQUIRED,
54
57
  include_package_data=True
55
58
  )
offtracker-2.7.7/PKG-INFO DELETED
@@ -1,146 +0,0 @@
1
- Metadata-Version: 2.1
2
- Name: offtracker
3
- Version: 2.7.7
4
- Summary: Track-seq data analysis
5
- Home-page: https://github.com/Lan-lab/offtracker
6
- Author: Runda Xu
7
- Author-email: runda.xu@foxmail.com
8
- Requires-Python: >=3.6.0
9
- Description-Content-Type: text/markdown
10
- License-File: LICENSE.txt
11
-
12
-
13
- OFF-TRACKER
14
- =======================
15
-
16
- OFF-TRACKER is an end to end pipeline of Track-seq data analysis for detecting off-target sites of any genome editing tools that generate double-strand breaks (DSBs) or single-strand breaks (SSBs).
17
-
18
- System requirements
19
- -----
20
- * Linux/Unix
21
- * Python >= 3.6
22
-
23
- Dependency
24
- -----
25
-
26
- ```bash
27
- # We recommend creating a new enviroment using mamba/conda to avoid compatibility problems
28
- # If you don't use mamba, just replace the code with conda
29
- mamba create -n offtracker -c bioconda blast snakemake pybedtools
30
- ```
31
-
32
-
33
- Installation
34
- -----
35
-
36
- ```bash
37
- # activate the environment
38
- conda activate offtracker
39
-
40
- # Direct installation with pip
41
- pip install offtracker
42
-
43
- # (Alternative) Download the offtracker from github
44
- git clone https://github.com/Lan-lab/offtracker.git
45
- cd offtracker
46
- pip install .
47
- ```
48
-
49
-
50
- Before analyzing samples
51
- -----
52
-
53
- ```bash
54
- # Build blast index (only need once for each genome)
55
- makeblastdb -input_type fasta -title hg38 -dbtype nucl -parse_seqids \
56
- -in /Your_Path_To_Reference/hg38_genome.fa \
57
- -out /Your_Path_To_Reference/hg38_genome.blastdb \
58
- -logfile /Your_Path_To_Reference/hg38_genome.blastdb.log
59
-
60
- # Build chromap index (only need once for each genome)
61
- chromap -i -r /Your_Path_To_Reference/hg38_genome.fa \
62
- -o /Your_Path_To_Reference/hg38_genome.chromap.index
63
-
64
- # Generate candidate regions by sgRNA sequence (need once for each genome and sgRNA)
65
- offtracker_candidates.py -t 8 -g hg38 \
66
- -r /Your_Path_To_Reference/hg38_genome.fa \
67
- -b /Your_Path_To_Reference/hg38_genome.blastdb \
68
- --name 'HEK4' --sgrna 'GGCACTGCGGCTGGAGGTGG' --pam 'NGG' \
69
- -o /Your_Path_To_Candidates
70
-
71
- ```
72
-
73
- Strand-specific mapping of Track-seq data
74
- -----
75
-
76
- ```bash
77
- # Generate snakemake config file
78
- offtracker_config.py -t 8 -g hg38 --blacklist hg38 \
79
- -r /Your_Path_To_Reference/hg38_genome.fa \
80
- -i /Your_Path_To_Reference/hg38_genome.chromap.index \
81
- -f /Your_Path_To_Fastq \
82
- -o /Your_Path_To_Output \
83
- --subfolder 0
84
-
85
- # --subfolder: If different samples are in seperate folders, set this to 1
86
- # -o: Default is outputting to /Your_Path_To_Fastq
87
-
88
- # Run the snakemake program
89
- cd /Your_Path_To_Fastq
90
- snakemake -np # dry run
91
- nohup snakemake --cores 16 1>snakemake.log 2>snakemake.err &
92
-
93
- ## about cores
94
- # --cores of snakemake must be larger than -t of offtracker_config.py
95
- # parallel number = cores/t
96
-
97
- ## about output
98
- # This part will generate "*.fw.scaled.bw" and ".rv.scaled.bw" for IGV visualization
99
- # "*.fw.bed" and "*.rv.bed" are used in the next part.
100
- ```
101
-
102
-
103
- Analyzing the off-target sites
104
- -----
105
-
106
- ```bash
107
- # In this part, multiple samples in the same condition can be analyzed in a single run by pattern recogonization of sample names
108
-
109
- offtracker_analysis.py -g hg38 --name "HEK4" \
110
- --exp 'Cas9_HEK4.*293' \
111
- --control 'control' \
112
- --outname 'Cas9_HEK4_293' \
113
- -f /Your_Path_To_Output \
114
- --seqfolder /Your_Path_To_Candidates
115
-
116
- # --name: the same as that in offtracker_candidates.py
117
- # --exp/--control: add one or multiple patterns of file name in regex
118
-
119
-
120
- # This step will generate Trackseq_result_{outname}.csv
121
- # Intermediate files are saved in ./temp folder, which can be deleted
122
- # Keeping the intermediate files can make the analysis faster if involving previously analyzed samples (e.g. using the same control samples for different analyses)
123
- ```
124
-
125
-
126
- Note1
127
- --------------
128
- The default setting only includes chr1-chr22, chrX, chrY, and chrM.
129
-
130
- Please make sure the reference genome contains "chr" at the beginning.
131
-
132
- If you have requirement for other chromosomes or species other than human/mouse, please post an issue.
133
-
134
- Note2
135
- --------------
136
- Currently, this software is only ready-to-use for mm10 and hg38.
137
-
138
- For any other genome, say hg19, please add genome size file named "hg19.chrom.sizes" to .\offtracker\mapping before install.
139
-
140
- Besides, add "--blacklist none" or "--blacklist Your_Blacklist" when running offtracker_config.py
141
-
142
- Note3
143
- --------------
144
- The FDR in the Track-seq result is not rigorous to the real off-target probability.
145
- It is strongly recommended to observe the "fw.scaled.bw" and "rv.scaled.bw" using IGV to check each target location from the Track-seq result.
146
-
@@ -1,134 +0,0 @@
1
- OFF-TRACKER
2
- =======================
3
-
4
- OFF-TRACKER is an end to end pipeline of Track-seq data analysis for detecting off-target sites of any genome editing tools that generate double-strand breaks (DSBs) or single-strand breaks (SSBs).
5
-
6
- System requirements
7
- -----
8
- * Linux/Unix
9
- * Python >= 3.6
10
-
11
- Dependency
12
- -----
13
-
14
- ```bash
15
- # We recommend creating a new enviroment using mamba/conda to avoid compatibility problems
16
- # If you don't use mamba, just replace the code with conda
17
- mamba create -n offtracker -c bioconda blast snakemake pybedtools
18
- ```
19
-
20
-
21
- Installation
22
- -----
23
-
24
- ```bash
25
- # activate the environment
26
- conda activate offtracker
27
-
28
- # Direct installation with pip
29
- pip install offtracker
30
-
31
- # (Alternative) Download the offtracker from github
32
- git clone https://github.com/Lan-lab/offtracker.git
33
- cd offtracker
34
- pip install .
35
- ```
36
-
37
-
38
- Before analyzing samples
39
- -----
40
-
41
- ```bash
42
- # Build blast index (only need once for each genome)
43
- makeblastdb -input_type fasta -title hg38 -dbtype nucl -parse_seqids \
44
- -in /Your_Path_To_Reference/hg38_genome.fa \
45
- -out /Your_Path_To_Reference/hg38_genome.blastdb \
46
- -logfile /Your_Path_To_Reference/hg38_genome.blastdb.log
47
-
48
- # Build chromap index (only need once for each genome)
49
- chromap -i -r /Your_Path_To_Reference/hg38_genome.fa \
50
- -o /Your_Path_To_Reference/hg38_genome.chromap.index
51
-
52
- # Generate candidate regions by sgRNA sequence (need once for each genome and sgRNA)
53
- offtracker_candidates.py -t 8 -g hg38 \
54
- -r /Your_Path_To_Reference/hg38_genome.fa \
55
- -b /Your_Path_To_Reference/hg38_genome.blastdb \
56
- --name 'HEK4' --sgrna 'GGCACTGCGGCTGGAGGTGG' --pam 'NGG' \
57
- -o /Your_Path_To_Candidates
58
-
59
- ```
60
-
61
- Strand-specific mapping of Track-seq data
62
- -----
63
-
64
- ```bash
65
- # Generate snakemake config file
66
- offtracker_config.py -t 8 -g hg38 --blacklist hg38 \
67
- -r /Your_Path_To_Reference/hg38_genome.fa \
68
- -i /Your_Path_To_Reference/hg38_genome.chromap.index \
69
- -f /Your_Path_To_Fastq \
70
- -o /Your_Path_To_Output \
71
- --subfolder 0
72
-
73
- # --subfolder: If different samples are in seperate folders, set this to 1
74
- # -o: Default is outputting to /Your_Path_To_Fastq
75
-
76
- # Run the snakemake program
77
- cd /Your_Path_To_Fastq
78
- snakemake -np # dry run
79
- nohup snakemake --cores 16 1>snakemake.log 2>snakemake.err &
80
-
81
- ## about cores
82
- # --cores of snakemake must be larger than -t of offtracker_config.py
83
- # parallel number = cores/t
84
-
85
- ## about output
86
- # This part will generate "*.fw.scaled.bw" and ".rv.scaled.bw" for IGV visualization
87
- # "*.fw.bed" and "*.rv.bed" are used in the next part.
88
- ```
89
-
90
-
91
- Analyzing the off-target sites
92
- -----
93
-
94
- ```bash
95
- # In this part, multiple samples in the same condition can be analyzed in a single run by pattern recogonization of sample names
96
-
97
- offtracker_analysis.py -g hg38 --name "HEK4" \
98
- --exp 'Cas9_HEK4.*293' \
99
- --control 'control' \
100
- --outname 'Cas9_HEK4_293' \
101
- -f /Your_Path_To_Output \
102
- --seqfolder /Your_Path_To_Candidates
103
-
104
- # --name: the same as that in offtracker_candidates.py
105
- # --exp/--control: add one or multiple patterns of file name in regex
106
-
107
-
108
- # This step will generate Trackseq_result_{outname}.csv
109
- # Intermediate files are saved in ./temp folder, which can be deleted
110
- # Keeping the intermediate files can make the analysis faster if involving previously analyzed samples (e.g. using the same control samples for different analyses)
111
- ```
112
-
113
-
114
- Note1
115
- --------------
116
- The default setting only includes chr1-chr22, chrX, chrY, and chrM.
117
-
118
- Please make sure the reference genome contains "chr" at the beginning.
119
-
120
- If you have requirement for other chromosomes or species other than human/mouse, please post an issue.
121
-
122
- Note2
123
- --------------
124
- Currently, this software is only ready-to-use for mm10 and hg38.
125
-
126
- For any other genome, say hg19, please add genome size file named "hg19.chrom.sizes" to .\offtracker\mapping before install.
127
-
128
- Besides, add "--blacklist none" or "--blacklist Your_Blacklist" when running offtracker_config.py
129
-
130
- Note3
131
- --------------
132
- The FDR in the Track-seq result is not rigorous to the real off-target probability.
133
- It is strongly recommended to observe the "fw.scaled.bw" and "rv.scaled.bw" using IGV to check each target location from the Track-seq result.
134
-