ORForise 1.5.1__py3-none-any.whl → 1.6.1__py3-none-any.whl

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (31) hide show
  1. ORForise/Aggregate_Compare.py +2 -4
  2. ORForise/Annotation_Compare.py +16 -53
  3. ORForise/Annotation_Intersector.py +726 -0
  4. ORForise/Aux/TabToGFF/TabToGFF.py +140 -0
  5. ORForise/Convert_To_GFF.py +139 -0
  6. ORForise/GFF_Adder.py +454 -179
  7. ORForise/List_Tools.py +63 -0
  8. ORForise/StORForise.py +8 -4
  9. ORForise/Tools/EasyGene/EasyGene.py +13 -1
  10. ORForise/Tools/{GLIMMER_3/GLIMMER_3.py → GLIMMER3/GLIMMER3.py} +2 -2
  11. ORForise/Tools/GLIMMER3/__init__.py +0 -0
  12. ORForise/Tools/{GeneMark_HA/GeneMark_HA.py → GeneMarkHA/GeneMarkHA.py} +1 -1
  13. ORForise/Tools/GeneMarkHA/__init__.py +0 -0
  14. ORForise/Tools/Prodigal/Prodigal.py +13 -1
  15. ORForise/utils.py +4 -1
  16. orforise-1.6.1.dist-info/METADATA +1038 -0
  17. {orforise-1.5.1.dist-info → orforise-1.6.1.dist-info}/RECORD +29 -24
  18. {orforise-1.5.1.dist-info → orforise-1.6.1.dist-info}/entry_points.txt +6 -2
  19. ORForise/GFF_Intersector.py +0 -192
  20. orforise-1.5.1.dist-info/METADATA +0 -427
  21. /ORForise/{Tools → Aux}/StORF_Undetected/Completely_Undetected/Completey_Undetected.py +0 -0
  22. /ORForise/{Tools/GLIMMER_3 → Aux/StORF_Undetected/Completely_Undetected}/__init__.py +0 -0
  23. /ORForise/{Tools → Aux}/StORF_Undetected/StORF_Undetected.py +0 -0
  24. /ORForise/{Tools/GeneMark_HA → Aux/StORF_Undetected}/__init__.py +0 -0
  25. /ORForise/{Tools/StORF_Undetected/Completely_Undetected → Aux/StORF_Undetected/unvitiated_Genes}/__init__.py +0 -0
  26. /ORForise/{Tools → Aux}/StORF_Undetected/unvitiated_Genes/unvitiated_Missed_Genes.py +0 -0
  27. /ORForise/{Tools/StORF_Undetected → Aux/TabToGFF}/__init__.py +0 -0
  28. /ORForise/{Tools/StORF_Undetected/unvitiated_Genes → Aux}/__init__.py +0 -0
  29. {orforise-1.5.1.dist-info → orforise-1.6.1.dist-info}/WHEEL +0 -0
  30. {orforise-1.5.1.dist-info → orforise-1.6.1.dist-info}/licenses/LICENSE +0 -0
  31. {orforise-1.5.1.dist-info → orforise-1.6.1.dist-info}/top_level.txt +0 -0
@@ -1,427 +0,0 @@
1
- Metadata-Version: 2.4
2
- Name: ORForise
3
- Version: 1.5.1
4
- Summary: ORForise - Platform for analysing and comparing Prokaryote CoDing Sequence (CDS) Gene Predictions.
5
- Home-page: https://github.com/NickJD/ORForise
6
- Author: Nicholas Dimonaco
7
- Author-email: nicholas@dimonaco.co.uk
8
- Project-URL: Bug Tracker, https://github.com/NickJD/ORForise/issues
9
- Classifier: Programming Language :: Python :: 3
10
- Classifier: License :: OSI Approved :: GNU General Public License v3 (GPLv3)
11
- Classifier: Operating System :: OS Independent
12
- Requires-Python: >=3.6
13
- Description-Content-Type: text/markdown
14
- License-File: LICENSE
15
- Requires-Dist: numpy
16
- Dynamic: license-file
17
-
18
- # ORForise - Prokaryote Genome Annotation Analysis and Comparison Platform
19
- ## Published in Bioinformatics : https://academic.oup.com/bioinformatics/article/38/5/1198/6454948
20
- ### Platform for analysing and comparing Prokaryote CoDing Sequence (CDS) Gene Predictions.
21
- ### Novel genome annotations can be compared to a provided reference annotation from Ensembl and predictions from other tools (or any given GFF annotation) .
22
-
23
- # Requirements and Installation:
24
-
25
- ### The ORForise platform is written in Python (3.6-3.9) and only requires the NumPy library (should be installed automatically by pip when installing ORForise) which is standard in most base installations of Python3.
26
-
27
- ## Intallation:
28
-
29
- ### The ORForise platform is available via the pip Python package manager ```pip3 install ORForise```.
30
- ### Consider using '--no-cache-dir' with pip to ensure the download of the newest version of the package.
31
-
32
- ## Required Files:
33
-
34
- To run, you need:
35
- * Input Genome FASTA and corresponding GFF file (or CDS predictions with the annotated genes for the genome you want to use as reference in one of the tool output formats listed below).
36
- * A prediction output from one of the compatible tools for the same genome.
37
-
38
- ### How to add your own Genome:
39
-
40
- Corresponding FASTA and GFF files must be provided for the genome the analysis is to be performed on, including the corresponding output of any tools to compare.
41
-
42
- ### How to add your own tool:
43
-
44
- If the new tool reports its predictions in GFF you can present ORForise with "GFF" for either the reference ```-rt``` or prediction ```-t``` option.
45
- If the tool uses another non-standard format, a request can be made to add it as an option via GitHub.
46
-
47
-
48
- ### Testing:
49
- Precomputed testing and data which includes example input and output files for all tools presented below is available in the `~ORForise/Testing` directory of the GitHub repository.
50
- Example output files from ```Annotation-Compare```, ```GFF-Adder``` and ```GFF-Intersector``` are made available to validate installation.
51
-
52
-
53
- ## CDS Prediction Analysis:
54
-
55
- ### Use-cases: (Running if via pip)
56
-
57
- For Help: ```Annotation-Compare -h ```
58
-
59
- ```python
60
- ORForise v1.5.1: Annotatione-Compare Run Parameters.
61
-
62
- Required Arguments:
63
- -dna GENOME_DNA Genome DNA file (.fa) which both annotations are based on
64
- -ref REFERENCE_ANNOTATION
65
- Which reference annotation file to use as reference?
66
- -t TOOL Which tool to analyse? (Prodigal)
67
- -tp TOOL_PREDICTION Tool genome prediction file (.gff) - Different Tool Parameters are compared individually via
68
- separate files
69
-
70
- Optional Arguments:
71
- -rt REFERENCE_TOOL What type of Annotation to compare to? -- Leave blank for Ensembl reference- Provide tool
72
- name to compare output from two tools
73
-
74
- Output:
75
- -o OUTDIR Define directory where detailed output should be places
76
- -n OUTNAME Define output filename(s) prefix - If not provided, filename of reference annotation file will be used- <outname>_<contig_id>_ORF_Comparison.csv
77
-
78
- Misc:
79
- -v {True,False} Default - False: Print out runtime status
80
- ```
81
-
82
- ## Compare a novel genome annotation to an Ensembl annotation:
83
-
84
- Genome annotation is a difficult process, even for Prokaryotes. ORForise allows the direct and systematic analysis of
85
- a novel CDS prediction from a wide selection of tools to a reference Genome Annotation, such as those provided by
86
- Ensembl Bacteria.
87
-
88
- #### Example: Installation through pip will allow user to call the programs directly from the ORForise package.
89
- ```python
90
- Annotation-Compare -dna ~/Testing/Myco.fa -ref ~/Testing/Myco.gff -t Prodigal -tp ~/Testing/Prodigal_Myco.gff
91
- ```
92
- ### Compare different novel annotations with each other on a single Genome:
93
-
94
- If a reference Genome Annotation is not available or a direct comparison between two or more tools is wanted,
95
- ORForise can be used as the example below.
96
-
97
- ## Aggregate CDS Prediction Analysis:
98
-
99
- ### Use-cases: (Running if via pip)
100
-
101
- For Help: ```Aggregate-Compare -h ```
102
-
103
- ```python
104
- ORForise v1.5.1: Aggregate-Compare Run Parameters.
105
-
106
- Required Arguments:
107
- -dna GENOME_DNA Genome DNA file (.fa) which both annotations are based on
108
- -t TOOLS Which tools to analyse? (Prodigal,GeneMarkS)
109
- -tp TOOL_PREDICTIONS Tool genome prediction file (.gff) - Providefile locations for each tool comma separated
110
- -ref REFERENCE_ANNOTATION
111
- Which reference annotation file to use as reference?
112
-
113
- Optional Arguments:
114
- -rt REFERENCE_TOOL What type of Annotation to compare to? -- Leave blank for Ensembl reference- Provide tool
115
- name to compare output from two tools
116
-
117
- Output:
118
- -o OUTNAME Define full output filename (format is CSV) - If not provided, summary will be printed to
119
- std-out
120
-
121
- Misc:
122
- -v {True,False} Default - False: Print out runtime status
123
- ```
124
-
125
- #### Example:
126
- ```python
127
- Aggregate-Compare -ref ~/Testing/Myco.gff -dna ~/Testing/Myco.fa -t Prodigal,TransDecoder,GeneMark_S_2 -tp ~/Testing/Prodigal_Myco.gff,~/Testing/TransDecoder_Myco.gff,~/Testing/GeneMark_S_2_Myco.gff
128
- ```
129
- This will compare the Aggregate the predictions of Prodigal, TransDecoder and GLIMMER 3 against the Mycoplasma reference annotation provided by
130
- Ensembl Bacteria.
131
-
132
- ## Annotation Comparison Output - The output format is the same for Annotation_Compare and Aggregate_Compare:
133
- ### Print to screen example - Prodigal prediction compared to Ensembl Bacteria reference annotation of *Escherichia coli*:
134
- ```bash
135
- Annotation-Compare.py -ref ./Testing/Myco.gff -dna ./Testing/Myco.fa -t Prodigal -tp ./Testing/Prodigal_Myco.gff
136
- Genome Used: Myco
137
- Reference Used: Testing/Myco.gff
138
- Tool Compared: Prodigal
139
- Perfect Matches:128[476] -26.89%
140
- Partial Matches:62[476] - 13.03%
141
- Missed Genes:286[476] - 60.08%
142
- Complete
143
- ```
144
-
145
- ``` bash
146
- Aggregate-Compare -ref ./Testing/Myco.gff -dna ./Testing/Myco.fa -t Prodigal,TransDecoder,GeneMark_S_2 -tp ./Testing/Prodigal_Myco.gff,./Testing/TransDecoder_Myco.gff,./Testing/GeneMark_S_2_Myco.gff
147
- Prodigal
148
- TransDecoder
149
- GeneMark_S_2
150
- Match filtered out
151
- Match filtered out
152
- Match filtered out
153
- Match filtered out
154
- Match filtered out
155
- Match filtered out
156
- Genome Used: Myco
157
- Reference Used: ./Testing/Myco.gff
158
- Tools Compared: Prodigal,TransDecoder,GeneMark_S_2
159
- Perfect Matches:132[476]
160
- Partial Matches:58[476]
161
- Missed Genes:286[476]
162
- ```
163
-
164
- This is the default output of the comparison tools.
165
-
166
- ### '-o' Example output to CSV file - Prodigal prediction compared to Ensembl Bacteria reference annotation of *Escherichia coli*:
167
- The output is designed to be human-readable and interpretable by the included 'ORForise_Analysis' scripts.
168
- The example below presents the 12 'Representative' and 72 'All' Metrics but only shows one entry for each of the induvidual prediction reports (Perfect_Match_Genes,Partial_Match_Genes,Missed_Genes,Predicted_CDS_Without_Corresponding_Gene_in_Reference,Predicted_CDSs_Which_Detected_more_than_one_Gene).
169
-
170
- ```csv
171
- Representative_Metrics:
172
- Percentage_of_Genes_Detected,Percentage_of_ORFs_that_Detected_a_Gene,Percent_Difference_of_All_ORFs,Median_Length_Difference,Percentage_of_Perfect_Matches,Median_Start_Difference_of_Matched_ORFs,Median_Stop_Difference_of_Matched_ORFs,Percentage_Difference_of_Matched_Overlapping_CDSs,Percent_Difference_of_Short-Matched-ORFs,Precision,Recall,False_Discovery_Rate
173
- 39.92,19.10,109.03,-62.17,67.37,67.5,-85.5,-83.71,-17.39,0.19,0.40,0.81
174
- All_Metrics:
175
- Number_of_ORFs,Percent_Difference_of_All_ORFs,Number_of_ORFs_that_Detected_a_Gene,Percentage_of_ORFs_that_Detected_a_Gene,Number_of_Genes_Detected,Percentage_of_Genes_Detected,Median_Length_of_All_ORFs,Median_Length_Difference,Minimum_Length_of_All_ORFs,Minimum_Length_Difference,Maximum_Length_of_All_ORFs,Maximum_Length_Difference,Median_GC_content_of_All_ORFs,Percent_Difference_of_All_ORFs_Median_GC,Median_GC_content_of_Matched_ORFs,Percent_Difference_of_Matched_ORF_GC,Number_of_ORFs_which_Overlap_Another_ORF,Percent_Difference_of_Overlapping_ORFs,Maximum_ORF_Overlap,Median_ORF_Overlap,Number_of_Matched_ORFs_Overlapping_Another_ORF,Percentage_Difference_of_Matched_Overlapping_CDSs,Maximum_Matched_ORF_Overlap,Median_Matched_ORF_Overlap,Number_of_Short-ORFs,Percent_Difference_of_Short-ORFs,Number_of_Short-Matched-ORFs,Percent_Difference_of_Short-Matched-ORFs,Number_of_Perfect_Matches,Percentage_of_Perfect_Matches,Number_of_Perfect_Starts,Percentage_of_Perfect_Starts,Number_of_Perfect_Stops,Percentage_of_Perfect_Stops,Number_of_Out_of_Frame_ORFs,Number_of_Matched_ORFs_Extending_a_Coding_Region,Percentage_of_Matched_ORFs_Extending_a_Coding_Region,Number_of_Matched_ORFs_Extending_Start_Region,Percentage_of_Matched_ORFs_Extending_Start_Region,Number_of_Matched_ORFs_Extending_Stop_Region,Percentage_of_Matched_ORFs_Extending_Stop_Region,Number_of_All_ORFs_on_Positive_Strand,Percentage_of_All_ORFs_on_Positive_Strand,Number_of_All_ORFs_on_Negative_Strand,Percentage_of_All_ORFs_on_Negative_Strand,Median_Start_Difference_of_Matched_ORFs,Median_Stop_Difference_of_Matched_ORFs,ATG_Start_Percentage,GTG_Start_Percentage,TTG_Start_Percentage,ATT_Start_Percentage,CTG_Start_Percentage,Other_Start_Codon_Percentage,TAG_Stop_Percentage,TAA_Stop_Percentage,TGA_Stop_Percentage,Other_Stop_Codon_Percentage,True_Positive,False_Positive,False_Negative,Precision,Recall,False_Discovery_Rate,Nucleotide_True_Positive,Nucleotide_False_Positive,Nucleotide_True_Negative,Nucleotide_False_Negative,Nucleotide_Precision,Nucleotide_Recall,Nucleotide_False_Discovery_Rate,ORF_Nucleotide_Coverage_of_Genome,Matched_ORF_Nucleotide_Coverage_of_Genome
176
- 995,109.03,190,19.10,190,39.92,335.0,-62.17,89,-21.24,3152,-41.81,31.50,0.20,32.83,4.42,279,26.24,135,0.00,36,-83.71,31,4.50,443,1826.09,19,-17.39,128,67.37,162,85.26,154,81.05,0,0,0.00,4,2.11,0,0.00,570,0.57,425,0.43,67.5,-85.5,63.12,15.28,21.61,0.00,0.00,0.00,11.06,27.44,61.51,0.00,0.40,1.69,0.60,0.19,0.40,0.81,0.82,0.31,0.69,0.18,0.96,0.82,0.04,77.15,24.47
177
- CDS_Gene_Coverage_of_Genome:
178
- 90.62
179
- Start_Position_Difference:
180
- -78,33,93,294,144,408,3,18,156,-42,45,90,333,333,-39,111,201,93,120,-354,-150,-366,117,-138,-240,123,-153,-51
181
- Stop_Position_Difference:
182
- -192,-147,108,-216,87,-678,-96,-156,-321,-240,-168,-162,-51,-126,-33,-3,-93,-12,-204,-189,-156,237,-45,-219,-201,-537,-30,-78,159,243,60,21,15,183,288,6
183
- Alternative_Starts_Predicted:
184
-
185
- Alternative_Stops_Predicted:
186
-
187
- Undetected_Gene_Metrics:
188
- ATG_Start ,GTG_Start ,TTG_Start ,ATT_Start ,CTG_Start ,Alternative_Start_Codon ,TGA_Stop ,TAA_Stop ,TAG_Stop ,Alternative_Stop_Codon ,Median_Length ,ORFs_on_Positive_Strand ,ORFs_on_Negative_Strand
189
- 88.46,7.69,3.85,0.00,0.00,0.00,0.00,74.13,25.87,0.00,1047.50,156,130
190
- Perfect_Match_Genes:
191
- >Myco_686_1828_+
192
- ATGAAAATATTAATTAATAAAAGTGAATTGAATAAAATTTTGAAAAAAATGAATAACGTTATTATTTCCAATAACAAAATAAAACCACATCATTCATATTTTTTAATAGAGGCAAAAGAAAAAGAAATAAACTTTTATGCTAACAATGAATACTTTTCTGTCAAATGTAATTTAAATAAAAATATTGATATTCTTGAACAAGGCTCCTTAATTGTTAAAGGAAAAATTTTTAACGATCTTATTAATGGCATAAAAGAAGAGATTATTACTATTCAAGAAAAAGATCAAACACTTTTGGTTAAAACAAAAAAAACAAGTATTAATTTAAACACAATTAATGTGAATGAATTTCCAAGAATAAGGTTTAATGAAAAAAACGATTTAAGTGAATTTAATCAATTCAAAATAAATTATTCACTTTTAGTAAAAGGCATTAAAAAAATTTTTCACTCAGTTTCAAATAATCGTGAAATATCTTCTAAATTTAATGGAGTAAATTTCAATGGATCCAATGGAAAAGAAATATTTTTAGAAGCTTCTGACACTTATAAACTATCTGTTTTTGAGATAAAGCAAGAAACAGAACCATTTGATTTCATTTTGGAGAGTAATTTACTTAGTTTCATTAATTCTTTTAATCCTGAAGAAGATAAATCTATTGTTTTTTATTACAGAAAAGATAATAAAGATAGCTTTAGTACAGAAATGTTGATTTCAATGGATAACTTTATGATTAGTTACACATCGGTTAATGAAAAATTTCCAGAGGTAAACTACTTTTTTGAATTTGAACCTGAAACTAAAATAGTTGTTCAAAAAAATGAATTAAAAGATGCACTTCAAAGAATTCAAACTTTGGCTCAAAATGAAAGAACTTTTTTATGCGATATGCAAATTAACAGTTCTGAATTAAAAATAAGAGCTATTGTTAATAATATCGGAAATTCTCTTGAGGAAATTTCTTGTCTTAAATTTGAAGGTTATAAACTTAATATTTCTTTTAACCCAAGTTCTCTATTAGATCACATAGAGTCTTTTGAATCAAATGAAATAAATTTTGATTTCCAAGGAAATAGTAAGTATTTTTTGATAACCTCTAAAAGTGAACCTGAACTTAAGCAAATATTGGTTCCTTCAAGATAA
193
-
194
- >Myco_4812_7322_+
195
- ATGGCAAAGCAACAAGATCAAGTAGATAAGATTCGTGAAAACTTAGACAATTCAACTGTCAAAAGTATTTCATTAGCAAATGAACTTGAGCGTTCATTCATGGAATATGCTATGTCAGTTATTGTTGCTCGTGCTTTACCTGATGCTAGAGATGGACTTAAACCAGTTCATCGTCGTGTTCTTTATGGTGCTTATATTGGTGGCATGCACCATGATCGTCCTTTTAAAAAGTCTGCGAGGATTGTTGGTGATGTAATGAGTAAATTCCACCCTCATGGTGATATGGCAATATATGACACCATGTCAAGAATGGCTCAAGACTTTTCATTAAGATACCTTTTAATTGATGGTCATGGTAATTTTGGTTCTATAGATGGTGATAGACCTGCTGCACAACGTTATACAGAAGCAAGATTATCTAAACTTGCAGCAGAACTTTTAAAAGATATTGATAAAGATACAGTTGACTTTATTGCTAATTATGATGGTGAGGAAAAAGAACCAACTGTTCTACCAGCAGCTTTCCCTAACTTACTTGCAAATGGTTCTAGTGGGATTGCAGTTGGAATGTCAACATCTATTCCTTCCCATAATCTCTCTGAATTAATTGCGGGTTTAATCATGTTAATTGATAATCCTCAATGCACTTTTCAAGAATTATTAACTGTAATTAAAGGACCTGATTTTCCAACAGGAGCTAACATTATCTACACAAAAGGAATTGAAAGCTACTTTGAAACAGGTAAAGGCAATGTAGTAATTCGTTCTAAAGTTGAGATAGAACAATTGCAAACAAGAAGTGCATTAGTTGTAACTGAAATTCCTTACATGGTTAACAAAACTACCTTAATTGAAAAGATTGTAGAACTTGTTAAAGCTGAAGAGATTTCAGGAATTGCTGATATCCGTGATGAATCCTCTCGAGAAGGAATAAGGTTAGTGATTGAAGTAAAACGCGACACTGTACCTGAAGTTTTATTAAATCAACTTTTTAAATCAACAAGATTACAAGTACGCTTCCCTGTTAATATGCTTGCTTTAGTTAAAGGAGCTCCTGTACTTCTCAACATGAAACAAGCTTTGGAAGTATATCTTGATCATCAAATTGATGTTCTTGTTAGAAAAACAAAGTTTGTGCTTAATAAACAACAAGAACGTTATCACATTTTAAGCGGACTTTTAATTGCTGCTTTAAATATTGATGAGGTTGTTGCAATTATTAAAAAATCAGCAAATAACCAGGAAGCAATTAATACATTAAATACAAAGTTTAAGCTTGATGAAATTCAAGCTAAAGCAGTTCTTGACATGCGTTTAAGGAGCTTAAGCGTACTTGAAGTTAACAAACTTCAAACTGAACAAAAAGAGTTAAAAGATTCAATTGAATTTTGTAAGAAAGTGTTAGCTGATCAAAAATTACAGCTAAAAATAATCAAAGAGGAATTGCAAAAAATCAATGATCAGTTTGGTGATGAAAGAAGAAGTGAAATTCTCTATGATATCTCTGAGGAAATTGATGATGAATCATTGATAAAAGTTGAGAATGTAGTGATAACTATGTCTACAAATGGTTATCTAAAAAGGATTGGAGTTGATGCTTATAATCTTCAACATCGTGGTGGAGTTGGGGTTAAAGGGCTAACTACTTATGTTGATGATAGTATTAGTCAATTATTGGTCTGTTCAACTCACTCTGACTTATTATTTTTTACTGATAAGGGTAAGGTTTATAGAATTAGAGCTCATCAAATTCCCTATGGTTTTAGAACAAATAAAGGTATTCCCGCTGTTAACTTAATCAAAATTGAAAAGGATGAAAGAATTTGTTCATTGTTATCTGTTAATAACTATGATGATGGTTATTTCTTTTTCTGTACTAAAAATGGAATTGTTAAAAGAACGAGCTTGAATGAATTCATCAACATCTTAAGTAATGGTAAGCGGGCTATATCTTTTGATGATAATGACACTTTGTATTCAGTAATTAAAACCCACGGAAATGATGAGATTTTTATTGGTTCTACCAATGGATTTGTTGTTCGCTTCCATGAAAATCAACTCAGAGTTCTTTCAAGAACAGCAAGAGGTGTATTTGGTATCAGTTTAAATAAAGGAGAATTTGTTAATGGACTATCAACTTCAAGCAACGGTAGCTTACTTTTATCAGTCGGTCAAAATGGAATAGGTAAATTAACGAGCATAGATAAATATAGACTCACAAAACGTAATGCTAAGGGAGTTAAAACTCTAAGGGTTACTGATAGAACAGGCCCTGTTGTTACAACAACCACTGTTTTTGGTAATGAGGATCTTTTAATGATTTCCTCTGCTGGTAAAATTGTGCGTACCAGTTTACAAGAACTTTCAGAACAAGGTAAAAACACTTCTGGTGTTAAGTTAATTAGATTAAAAGATAATGAACGTTTAGAAAGAGTAACTATCTTTAAAGAAGAGTTAGAAGACAAAGAAATGCAACTAGAAGATGTTGGATCCAAACAAATTACGCAATAA
196
- .........
197
- Partial_Match_Genes:
198
- Gene:9923_11251_+_ATG_TAA
199
- ATGAAAAGCGAAATTAATATTTTTGCACTAGCAACTGCACCTTTTAATAGTGCATTACATATTATTAGGTTTTCTGGTCCTGATGTTTATGAGATTTTAAACAAGATAACTAATAAAAAAATAACAAGAAAAGGGATGCAAATTCAACGCACATGGATAGTTGATGAAAACAATAAGCGAATTGATGATGTGCTATTATTTAAATTTGTCTCTCCAAATTCTTATACAGGAGAAGATTTAATTGAAATTTCTTGTCATGGTAACATGTTGATCGTTAATGAAATTTGCGCACTTCTTTTAAAAAAAGGAGGTGTTTATGCCAAACCTGGTGAATTTACCCAAAGGAGTTTTTTAAATGGAAAAATGAGTTTACAACAAGCTAGTGCTGTAAATAAATTGATTTTATCTCCTAACTTATTAGTTAAAGATATAGTCTTAAATAATTTAGCGGGTGAAATGGATCAACAATTAGAACAAATAGCTCAACAAGTTAATCAATTAGTAATGCAAATGGAAGTAAACATTGATTATCCAGAATATCTTGATGAACAAGTAGAACTATCAACTTTAAATAATAAAGTTAAATTGATTATTGAAAAGCTTAAAAGAATTATTGAAAATAGTAAACAACTCAAAAAACTTCACGATCCTTTTAAAATTGCCATTATAGGCGAAACTAATGTAGGTAAATCTTCTTTACTCAACGCTTTATTAAATCAAGATAAAGCGATAGTTTCAAATATTAAAGGTAGTACACGCGATGTTGTTGAAGGGGATTTCAATTTAAATGGTTATTTAATCAAGATCTTAGATACTGCAGGTATCCGTAAACATAAAAGTGGGCTTGAAAAAGCAGGAATTAAAAAAAGCTTTGAATCTATAAAGCAAGCTAATTTGGTTATTTATCTTTTAGATGCAACACATCCAAAGAAAGATCTTGAATTAATTAGTTTTTTTAAGAAAAATAAAAAGGATTTTTTTGTTTTCTATAACAAAAAAGATTTAATTACAAATAAGTTTGAAAATAGTATTTCTGCAAAGCAAAAAGATATTAAAGAATTAGTTGATTTATTAACTAAATATATTAACGAGTTTTATAAAAAAATAGATCAAAAAATCTATCTGATTGAAAATTGACAGCAAATTTTAATTGAAAAAATTAAAGAACAATTAGAACAGTTTTTAAAGCAACAAAAAAAATATTTATTTTTCGATGTTTTAGTTACCCATCTAAGAGAAGCTCAACAAGATATTCTTAAACTACTAGGTAAGGATGTAGGTTTTGATTTAGTTAATGAAATTTTTAATAATTTTTGTTTAGGAAAATAA
200
- ORF:9923_11059_+_ATG_TGA
201
- ATGAAAAGCGAAATTAATATTTTTGCACTAGCAACTGCACCTTTTAATAGTGCATTACATATTATTAGGTTTTCTGGTCCTGATGTTTATGAGATTTTAAACAAGATAACTAATAAAAAAATAACAAGAAAAGGGATGCAAATTCAACGCACATGGATAGTTGATGAAAACAATAAGCGAATTGATGATGTGCTATTATTTAAATTTGTCTCTCCAAATTCTTATACAGGAGAAGATTTAATTGAAATTTCTTGTCATGGTAACATGTTGATCGTTAATGAAATTTGCGCACTTCTTTTAAAAAAAGGAGGTGTTTATGCCAAACCTGGTGAATTTACCCAAAGGAGTTTTTTAAATGGAAAAATGAGTTTACAACAAGCTAGTGCTGTAAATAAATTGATTTTATCTCCTAACTTATTAGTTAAAGATATAGTCTTAAATAATTTAGCGGGTGAAATGGATCAACAATTAGAACAAATAGCTCAACAAGTTAATCAATTAGTAATGCAAATGGAAGTAAACATTGATTATCCAGAATATCTTGATGAACAAGTAGAACTATCAACTTTAAATAATAAAGTTAAATTGATTATTGAAAAGCTTAAAAGAATTATTGAAAATAGTAAACAACTCAAAAAACTTCACGATCCTTTTAAAATTGCCATTATAGGCGAAACTAATGTAGGTAAATCTTCTTTACTCAACGCTTTATTAAATCAAGATAAAGCGATAGTTTCAAATATTAAAGGTAGTACACGCGATGTTGTTGAAGGGGATTTCAATTTAAATGGTTATTTAATCAAGATCTTAGATACTGCAGGTATCCGTAAACATAAAAGTGGGCTTGAAAAAGCAGGAATTAAAAAAAGCTTTGAATCTATAAAGCAAGCTAATTTGGTTATTTATCTTTTAGATGCAACACATCCAAAGAAAGATCTTGAATTAATTAGTTTTTTTAAGAAAAATAAAAAGGATTTTTTTGTTTTCTATAACAAAAAAGATTTAATTACAAATAAGTTTGAAAATAGTATTTCTGCAAAGCAAAAAGATATTAAAGAATTAGTTGATTTATTAACTAAATATATTAACGAGTTTTATAAAAAAATAGATCAAAAAATCTATCTGATTGAAAATTGA
202
-
203
- Gene:11251_12039_+_ATG_TAA
204
- ATGGAATACTTTGATGCACATTGTCATTTAAATTGTGAACCTTTACTGAGTGAAATTGAAAAAAGCATCGCTAATTTCAAATTAATTAATTTAAAAGCAAATGTTGTAGGTACAGATTTGGATAATTCTAAAATTGCTGTTGAATTAGCTAAAAAATATCCTGATCTTTTAAAAGCAACCATAGGTATCCATCCAAATGATGTTCATTTAGTTGATTTTAAAAAGACAAAAAAACAACTTAATGAACTATTAATAAATAACAGAAATTTCATAAGTTGTATTGGTGAATATGGTTTTGATTATCACTACACAACAGAATTTATTGAATTGCAAAACAAATTCTTTGAGATGCAATTTGAAATAGCTGAAACTAATAAATTGGTTCACATGCTTCATATTCGTGATGCTCATGAAAAAATTTATGAAATATTAACAAGATTAAAGCCAACTCAACCTGTGATTTTTCATTGTTTCAGTCAAGATATAAATATTGCTAAAAAGCTACTATCATTAAAAGATTTAAATATTGACATCTTCTTTTCTATCCCAGGGATAGTTACTTTTAAGAATGCTCAAGCATTACATGAAGCTTTAAAGATTATTCCTAGTGAATTACTTTTAAGTGAAACTGACTCACCGTGATTAACCCCTTCTCCTTTTCGAGGCAAAGTTAACTGACCTGAATATGTAGTTCATACTGTTAGCACTGTTGCTGAAATAAAAAAAATAGAAATTGCTGAAATGAAGCGAATTATTGTTAAAAATGCAAAAAAATTATTTTGACATTAA
205
- ORF:11251_11892_+_ATG_TGA
206
- ATGGAATACTTTGATGCACATTGTCATTTAAATTGTGAACCTTTACTGAGTGAAATTGAAAAAAGCATCGCTAATTTCAAATTAATTAATTTAAAAGCAAATGTTGTAGGTACAGATTTGGATAATTCTAAAATTGCTGTTGAATTAGCTAAAAAATATCCTGATCTTTTAAAAGCAACCATAGGTATCCATCCAAATGATGTTCATTTAGTTGATTTTAAAAAGACAAAAAAACAACTTAATGAACTATTAATAAATAACAGAAATTTCATAAGTTGTATTGGTGAATATGGTTTTGATTATCACTACACAACAGAATTTATTGAATTGCAAAACAAATTCTTTGAGATGCAATTTGAAATAGCTGAAACTAATAAATTGGTTCACATGCTTCATATTCGTGATGCTCATGAAAAAATTTATGAAATATTAACAAGATTAAAGCCAACTCAACCTGTGATTTTTCATTGTTTCAGTCAAGATATAAATATTGCTAAAAAGCTACTATCATTAAAAGATTTAAATATTGACATCTTCTTTTCTATCCCAGGGATAGTTACTTTTAAGAATGCTCAAGCATTACATGAAGCTTTAAAGATTATTCCTAGTGAATTACTTTTAAGTGAAACTGACTCACCGTGA
207
- .......
208
- Missed_Genes:
209
- >Myco_1828_2760_+
210
- ATGAATCTTTACGATCTTTTAGAACTACCAACTACAGCATCAATAAAAGAAATAAAAATTGCTTATAAAAGATTAGCAAAGCGTTATCACCCTGATGTAAATAAATTAGGTTCGCAAACTTTTGTTGAAATTAATAATGCTTATTCAATATTAAGTGATCCTAACCAAAAGGAAAAATATGATTCAATGCTGAAAGTTAATGATTTTCAAAATCGCATCAAAAATTTAGATATTAGTGTTAGATGACATGAAAATTTCATGGAAGAACTCGAACTTCGTAAGAACTGAGAATTTGATTTTTTTTCATCTGATGAAGATTTCTTTTATTCTCCATTTACAAAAAACAAATATGCTTCCTTTTTAGATAAAGATGTTTCTTTAGCTTTTTTTCAGCTTTACAGCAAGGGCAAAATAGATCATCAATTGGAAAAATCTTTATTGAAAAGAAGAGATGTAAAAGAAGCTTGTCAACAGAATAAAAATTTTATTGAAGTTATAAAAGAGCAATATAACTATTTTGGTTGAATTGAAGCTAAGCGTTATTTCAATATTAATGTTGAACTTGAGCTCACACAGAGAGAGATAAGAGATAGAGATGTTGTTAACCTACCTTTAAAAATTAAAGTTATTAATAATGATTTTCCAAATCAACTCTGATATGAAATTTATAAAAACTATTCATTTCGCTTATCTTGAGATATAAAAAATGGTGAAATTGCTGAATTTTTCAATAAAGGTAATAGAGCTTTAGGATGAAAAGGTGACTTAATTGTCAGAATGAAAGTAGTTAATAAAGTAAACAAAAGACTGCGTATTTTTTCAAGCTTTTTTGAGAACGATAAATCTAAATTATGGTTCCTTGTTCCAAACGATAAACAAAGTAATCCTAATAAGGGCGTTTTTAACTATAAAACTCAGCACTTTATTGATTAA
211
-
212
- >Myco_2845_4797_+
213
- ATGGAAGAAAATAACAAAGCAAATATCTATGACTCTAGTAGCATTAAGGTCCTTGAAGGACTTGAGGCTGTTAGAAAACGCCCTGGAATGTACATTGGTTCTACTGGCGAAGAAGGTTTGCATCACATGATCTGAGAGATAGTAGACAACTCAATTGATGAAGCAATGGGAGGTTTTGCCAGTTTTGTTAAGCTTACCCTTGAAGATAATTTTGTTACCCGTGTAGAGGATGATGGAAGAGGGATACCTGTTGATATCCATCCTAAGACTAATCGTTCTACAGTTGAAACAGTTTTTACAGTTCTACACGCTGGCGGTAAATTTGATAACGATAGCTATAAAGTGTCAGGTGGTTTACACGGTGTTGGTGCATCAGTTGTTAATGCGCTTAGTTCTTCTTTTAAAGTTTGAGTTTTTCGTCAAAATAAAAAGTATTTTCTCAGCTTTAGCGATGGAGGAAAGGTAATTGGAGATTTGGTCCAAGAAGGTAACTCTGAAAAAGAGCATGGAACAATTGTTGAGTTTGTTCCTGATTTCTCTGTAATGGAAAAGAGTGATTACAAACAAACTGTAATTGTAAGCAGACTCCAGCAATTAGCTTTTTTAAACAAGGGAATAAGAATTGACTTTGTTGATAATCGTAAACAAAACCCACAGTCTTTTTCTTGAAAATATGATGGGGGATTGGTTGAATATATCCACCACCTAAACAACGAAAAAGAACCACTTTTTAATGAAGTTATTGCTGATGAAAAAACTGAAACTGTAAAAGCTGTTAATCGTGATGAAAACTACACAGTAAAGGTTGAAGTTGCTTTTCAATATAACAAAACATACAACCAATCAATTTTCAGTTTTTGTAACAACATTAATACTACAGAAGGTGGAACCCATGTGGAAGGTTTTCGTAATGCACTTGTTAAGATCATTAATCGCTTTGCTGTTGAAAATAAATTCCTAAAAGATAGTGATGAAAAGATTAACCGTGATGATGTTTGTGAAGGATTAACTGCTATTATTTCCATTAAACACCCAAACCCACAATATGAAGGACAAACTAAAAAGAAGTTAGGTAATACTGAGGTAAGACCTTTAGTTAATAGTGTTGTTAGTGAAATCTTTGAACGCTTCATGTTAGAAAACCCACAAGAAGCAAACGCTATCATCAGAAAAACACTTTTAGCTCAAGAAGCGAGAAGAAGAAGTCAAGAGGCTAGGGAGTTAACTCGTCGTAAATCACCTTTTGATAGTGGTTCATTACCAGGTAAATTAGCTGATTGTACAACCAGAGATCCTTCGATTAGTGAACTTTACATTGTTGAGGGTGATAGTGCTGGTGGCACTGCTAAAACAGGAAGAGATCGTTATTTTCAAGCTATCTTACCCTTAAGAGGAAAGATTTTAAACGTTGAAAAATCTAACTTTGAACAAATCTTTAATAATGCAGAAATTTCTGCATTAGTGATGGCAATAGGCTGTGGGATTAAACCTGATTTTGAACTTGAAAAACTTAGATATAGCAAGATTGTGATCATGACAGATGCTGATGTTGATGGTGCACACATAAGAACACTTCTCTTAACTTTCTTTTTTCGCTTTATGTATCCTTTGGTTGAACAAGGCAATATTTTTATTGCTCAACCCCCACTTTATAAAGTGTCATATTCCCATAAGGATTTATACATGCACACTGATGTTCAACTTGAACAGTGAAAAAGTCAAAACCCTAACGTAAAGTTTGGGTTACAAAGATATAAAGGACTTGGAGAAATGGATGCATTGCAGCTGTGAGAAACAACAATGGATCCTAAGGTTAGAACATTGTTAAAAGTTACTGTTGAAGATGCTTCTATTGCTGATAAAGCTTTTTCACTGTTGATGGGTGATGAAGTTCCCCCAAGAAGAGAATTTATTGAAAAAAATGCTCGTAGTGTTAAAAACATTGATATTTAA
214
-
215
- >Myco_7294_8547_+
216
- ATGTTGGATCCAAACAAATTACGCAATAACTATGATTTCTTTAAAAAGAAACTGTTAGAAAGAAATGTAAATGAGCAATTATTAAATCAGTTTATTCAAACTGATAAACTAATGCGCAAAAACTTGCAACAACTTGAACTTGCTAACCAAAAACAAAGCTTGTTGGCAAAACAAGTTGCTAAGCAAAAAGATAATAAAAAGCTATTAGCTGAATCAAAAGAACTTAAGCAGAAGATTGAAAACTTAAATAATGCTTATAAAGATTCACAAAACATTAGTCAAGATTTACTTCTAAATTTTCCTAATATTGCTCATGAATCAGTTCCTGTTGGTAAAAATGAATCAGCAAACTTAGAACTTCTTAAAGAAGGGAGAAAACCAGTTTTTGATTTCAAACCTTTACCACATCGAGAGTTATGTGAAAAGTTAAATTTAGTTGCTTTTGATAAAGCTACTAAGATTAGTGGAACTAGGTTTGTTGCATATACAGATAAAGCAGCTAAACTACTTAGAGCGATAACTAATCTAATGATTGACCTTAATAAAAGCAAGTATCAAGAATGAAACCTGCCAGTTGTTATTAATGAATTAAGTTTAAGATCAACCGGACAACTACCTAAGTTTAAAGATGATGTTTTTAAACTAGAAAACACCCGTTATTATCTTTCTCCAACTTTAGAGGTACAACTTATCAATTTACATGCTAATGAAATTTTTAATGAAGAAGATTTACCTAAATACTACACTGCAACAGGTATTAACTTTCGTCAAGAAGCGGGTAGTGCTGGTAAACAAACCAAAGGAACTATTAGATTGCATCAGTTTCAAAAAACTGAGTTAGTTAAGTTTTGTAAACCTGAAAATGCTATCAATGAATTGGAAGCAATGGTTAGAGATGCTGAACAAATCTTAAAGGCACTTAAGTTACCTTTTAGAAGGTTATTGTTATGTACTGGTGATATGGGCTTTAGTGCTGAAAAAACATATGATCTTGAAGTTTGAATGGCAGCTAGCAATGAATATCGTGAAGTTTCTTCTTGTTCATCTTGTGGTGATTTTCAAGCAAGAAGAGCTATGATTCGTTACAAAGATATTAACAACGGTAAAAACAGTTATGTTGCTACTTTAAATGGAACAGCATTATCTATTGATAGAATTTTTGCTGCAATTCTAGAAAATTTTCAAACAAAAGATGGCAAAATTCTTATCCCACAAGCATTAAAAAAATACCTTGATTTTGACACAATCAAGTAA
217
- ......
218
-
219
- ORFs_Without_Corresponding_Gene_In_Reference_Metrics:
220
- ATG_Start ,GTG_Start ,TTG_Start ,ATT_Start ,CTG_Start ,Alternative_Start_Codon ,TGA_Stop ,TAA_Stop ,TAG_Stop ,Alternative_Stop_Codon ,Median_Length ,ORFs_on_Positive_Strand ,ORFs_on_Negative_Strand
221
- 58.39,17.14,24.47,0.00,0.00,0.00,71.55,20.62,7.83,0.00,287.00,449,356
222
- ORF_Without_Corresponding_Gene_in_Reference:
223
- >Prodigal_1828_2073_+
224
- ATGAATCTTTACGATCTTTTAGAACTACCAACTACAGCATCAATAAAAGAAATAAAAATTGCTTATAAAAGATTAGCAAAGCGTTATCACCCTGATGTAAATAAATTAGGTTCGCAAACTTTTGTTGAAATTAATAATGCTTATTCAATATTAAGTGATCCTAACCAAAAGGAAAAATATGATTCAATGCTGAAAGTTAATGATTTTCAAAATCGCATCAAAAATTTAGATATTAGTGTTAGATGA
225
- >Prodigal_2605_2760_+
226
- ATGAAAGTAGTTAATAAAGTAAACAAAAGACTGCGTATTTTTTCAAGCTTTTTTGAGAACGATAAATCTAAATTATGGTTCCTTGTTCCAAACGATAAACAAAGTAATCCTAATAAGGGCGTTTTTAACTATAAAACTCAGCACTTTATTGATTAA
227
- >Prodigal_2845_2979_+
228
- ATGGAAGAAAATAACAAAGCAAATATCTATGACTCTAGTAGCATTAAGGTCCTTGAAGGACTTGAGGCTGTTAGAAAACGCCCTGGAATGTACATTGGTTCTACTGGCGAAGAAGGTTTGCATCACATGATCTGA
229
- >Prodigal_3010_3255_+
230
- ATGGGAGGTTTTGCCAGTTTTGTTAAGCTTACCCTTGAAGATAATTTTGTTACCCGTGTAGAGGATGATGGAAGAGGGATACCTGTTGATATCCATCCTAAGACTAATCGTTCTACAGTTGAAACAGTTTTTACAGTTCTACACGCTGGCGGTAAATTTGATAACGATAGCTATAAAGTGTCAGGTGGTTTACACGGTGTTGGTGCATCAGTTGTTAATGCGCTTAGTTCTTCTTTTAAAGTTTGA
231
- >Prodigal_3319_3513_+
232
- TTGGTCCAAGAAGGTAACTCTGAAAAAGAGCATGGAACAATTGTTGAGTTTGTTCCTGATTTCTCTGTAATGGAAAAGAGTGATTACAAACAAACTGTAATTGTAAGCAGACTCCAGCAATTAGCTTTTTTAAACAAGGGAATAAGAATTGACTTTGTTGATAATCGTAAACAAAACCCACAGTCTTTTTCTTGA
233
- >Prodigal_3529_4557_+
234
- TTGGTTGAATATATCCACCACCTAAACAACGAAAAAGAACCACTTTTTAATGAAGTTATTGCTGATGAAAAAACTGAAACTGTAAAAGCTGTTAATCGTGATGAAAACTACACAGTAAAGGTTGAAGTTGCTTTTCAATATAACAAAACATACAACCAATCAATTTTCAGTTTTTGTAACAACATTAATACTACAGAAGGTGGAACCCATGTGGAAGGTTTTCGTAATGCACTTGTTAAGATCATTAATCGCTTTGCTGTTGAAAATAAATTCCTAAAAGATAGTGATGAAAAGATTAACCGTGATGATGTTTGTGAAGGATTAACTGCTATTATTTCCATTAAACACCCAAACCCACAATATGAAGGACAAACTAAAAAGAAGTTAGGTAATACTGAGGTAAGACCTTTAGTTAATAGTGTTGTTAGTGAAATCTTTGAACGCTTCATGTTAGAAAACCCACAAGAAGCAAACGCTATCATCAGAAAAACACTTTTAGCTCAAGAAGCGAGAAGAAGAAGTCAAGAGGCTAGGGAGTTAACTCGTCGTAAATCACCTTTTGATAGTGGTTCATTACCAGGTAAATTAGCTGATTGTACAACCAGAGATCCTTCGATTAGTGAACTTTACATTGTTGAGGGTGATAGTGCTGGTGGCACTGCTAAAACAGGAAGAGATCGTTATTTTCAAGCTATCTTACCCTTAAGAGGAAAGATTTTAAACGTTGAAAAATCTAACTTTGAACAAATCTTTAATAATGCAGAAATTTCTGCATTAGTGATGGCAATAGGCTGTGGGATTAAACCTGATTTTGAACTTGAAAAACTTAGATATAGCAAGATTGTGATCATGACAGATGCTGATGTTGATGGTGCACACATAAGAACACTTCTCTTAACTTTCTTTTTTCGCTTTATGTATCCTTTGGTTGAACAAGGCAATATTTTTATTGCTCAACCCCCACTTTATAAAGTGTCATATTCCCATAAGGATTTATACATGCACACTGATGTTCAACTTGAACAGTGA
235
- ....
236
- ORFs_Which_Detected_more_than_one_Gene:
237
-
238
- ```
239
-
240
-
241
- ## GFF Tools:
242
-
243
- ### GFF-Adder:
244
-
245
- GFF-Adder allows for the addition of predicted CDSs to an existing reference annotation (GFF or another tool) which produces a new GFF containing the original
246
- genes plus the new CDS from another prediction. Default filtering will remove additional CDSs that overlap existing genes by more than 50 nt.
247
- The ```-gi``` option can be used to allow for different genomic elements to be accounted for, other than only CDSs in the reference annotation.
248
-
249
- For Help: ```GFF-Adder -h ```
250
-
251
- ```python
252
- ORForise v1.5.1: GFF-Adder Run Parameters.
253
-
254
- Required Arguments:
255
- -dna GENOME_DNA Genome DNA file (.fa) which both annotations are based on
256
- -ref REFERENCE_ANNOTATION
257
- Which reference annotation file to use as reference?
258
- -at ADDITIONAL_TOOL Which format to use for additional annotation?
259
- -add ADDITIONAL_ANNOTATION
260
- Which annotation file to add to reference annotation?
261
- -o OUTPUT_FILE Output filename
262
-
263
- Optional Arguments:
264
- -rt REFERENCE_TOOL Which tool format to use as reference? - If not provided, will default to standard Ensembl
265
- GFF format, can be Prodigal or any of the other tools available
266
- -gi GENE_IDENT Identifier used for extraction of "genic" regions from reference annotation "CDS,rRNA,tRNA":
267
- Default for is "CDS"
268
- -gene_ident GENE_IDENT
269
- Identifier used for identifying genomic features in reference annotation "CDS,rRNA,tRNA"
270
- -olap OVERLAP Maximum overlap between reference and additional genic regions (CDS,rRNA etc) - Default: 50
271
- nt
272
- ```
273
-
274
- #### Example: Running GFF-Adder to combine the additional CDS predictions made by Prodial to the canonical annotations from Ensembl.
275
- ``` GFF-Adder -dna ~/Testing/Myco.fa -ref ~/Testing/Myco.gff -at Prodigal -add ~/Testing/Prodigal_Myco.gff -o ~/Testing/Myco_Ensembl_GFF_Adder_Prodigal.gff ```
276
- #### Example Output: ~/ORForise/Testing/Myco_Ensembl_GFF_Adder_Prodigal.gff
277
- ```
278
- ##gff-version 3
279
- # GFF-Adder
280
- # Run Date:2021-11-10
281
- ##Genome DNA File:./Testing/Myco.fa
282
- ##Original File: ./Testing/Myco.gff
283
- ##Additional File: ./Testing/Prodigal_Myco.gff
284
- .......
285
- Chromosome Reference_Annotation CDS 68522 70225 . - . ID=Original_Annotation
286
- Chromosome Reference_Annotation CDS 70530 72572 . + . ID=Original_Annotation
287
- Chromosome Reference_Annotation CDS 72523 73434 . + . ID=Original_Annotation
288
- Chromosome Prodigal CDS 73445 73648 . + . ID=Additional_Annotation
289
- Chromosome Reference_Annotation CDS 73690 77685 . + . ID=Original_Annotation
290
- Chromosome Reference_Annotation CDS 77685 79085 . + . ID=Original_Annotation
291
- Chromosome Reference_Annotation CDS 79089 81035 . + . ID=Original_Annotation
292
- Chromosome Reference_Annotation CDS 81046 82596 . + . ID=Original_Annotation
293
- Chromosome Reference_Annotation CDS 82620 84044 . + . ID=Original_Annotation
294
- Chromosome Prodigal CDS 84082 84312 . + . ID=Additional_Annotation
295
- Chromosome Prodigal CDS 84532 84744 . - . ID=Additional_Annotation
296
- Chromosome Prodigal CDS 84776 85051 . + . ID=Additional_Annotation
297
- ```
298
-
299
- ### GFF-Intersector:
300
-
301
- GFF-Intersector enables the aggregation of different genome annotations and CDS predictions and creates a single GFF
302
- representing the intersection of the two existing annotations.
303
- GFF-Intersector also provides an option to allow the retention of genes that have a user defined difference (minimum % coverage and in-frame).
304
- The ```-gi``` option can be used to allow for different genomic elements to be accounted for, other than only CDSs in the reference annotation.
305
-
306
- For Help: ```GFF-Intersector -h ```
307
- ```python
308
- ORForise v1.5.1: GFF-Intersector Run Parameters.
309
-
310
- Required Arguments:
311
- -dna GENOME_DNA Genome DNA file (.fa) which both annotations are based on
312
- -ref REFERENCE_ANNOTATION
313
- Which reference annotation file to use as reference?
314
- -at ADDITIONAL_TOOL Which format to use for additional annotation?
315
- -add ADDITIONAL_ANNOTATION
316
- Which annotation file to add to reference annotation?
317
- -o OUTPUT_FILE Output filename
318
-
319
- Optional Arguments:
320
- -rt REFERENCE_TOOL Which tool format to use as reference? - If not provided, will default to standard Ensembl
321
- GFF format, can be Prodigal or any of the other tools available
322
- -gi GENE_IDENT Identifier used for extraction of "genic" regions from reference annotation "CDS,rRNA,tRNA":
323
- Default for is "CDS"
324
- -cov COVERAGE Percentage coverage of reference annotation needed to confirm intersection - Default: 100 ==
325
- exact match
326
- ```
327
-
328
- #### Example: Running GFF-Intersector to combine the additional CDS predictions made by Prodial to the canonical annotations from Ensembl.
329
- ``` GFF-Intersector -dna ~/Testing/Myco.fa -ref ~/Testing/Myco.gff -at Prodigal -add ~/Testing/Prodigal_Myco.gff -o ~/Testing/Myco_Ensembl_GFF_Intersector_Prodigal.gff```
330
-
331
- #### Example Output: ~/Testing/Myco_Ensembl_GFF_Intersector_Prodigal.gff
332
- ```
333
- ##gff-version 3
334
- # GFF-Intersector
335
- # Run Date:2021-11-10
336
- ##Genome DNA File:./Testing/Myco.fa
337
- ##Original File: ./Testing/Myco.gff
338
- ##Intersecting File: ./Testing/Prodigal_Myco.gff
339
- Chromosome original CDS 686 1828 . + . ID=Original_Annotation;Coverage=100
340
- Chromosome original CDS 4812 7322 . + . ID=Original_Annotation;Coverage=100
341
- Chromosome original CDS 8551 9183 . + . ID=Original_Annotation;Coverage=100
342
- Chromosome original CDS 22389 23558 . + . ID=Original_Annotation;Coverage=100
343
- Chromosome original CDS 29552 30124 . + . ID=Original_Annotation;Coverage=100
344
- Chromosome original CDS 31705 32325 . - . ID=Original_Annotation;Coverage=100
345
- Chromosome original CDS 49376 49642 . + . ID=Original_Annotation;Coverage=100
346
- Chromosome original CDS 59082 59753 . + . ID=Original_Annotation;Coverage=100
347
- Chromosome original CDS 61014 61406 . + . ID=Original_Annotation;Coverage=100
348
- Chromosome original CDS 82620 84044 . + . ID=Original_Annotation;Coverage=100
349
- ```
350
-
351
-
352
- # Genomes Available:
353
-
354
- The .fa and .gff files (from Ensembl Bacteria Release 46) below are available in the Genomes directory.
355
-
356
- * *Bacillus subtilis* - Strain BEST7003 - Assembly ASM52304v1
357
- * *Caulobacter crescentus* - Strain CB15 - Assembly ASM690v1
358
- * *Escherichia coli K-12* - Strain ER3413 - Assembly ASM80076v1
359
- * *Mycoplasma genitalium* - Strain G37 - Assembly ASM2732v1
360
- * *Pseudomonas fluorescens* - Strain UK4 - Assembly ASM73042v1
361
- * *Staphylococcus aureus* - Strain 502A - Assembly ASM59796v1
362
-
363
- # Prediction Tools Available:
364
-
365
- There are two Groups of tools - Those which do require a pre-built model and those which do not. \
366
- For the example runs provided, each tool is listed with the non-default options used and their predictions for each of the 6 model organisms are available in their respective
367
- directories.
368
- ORForise only needs the tool name and the annotation file produced from any available model to undertake the analysis.
369
-
370
- ## GFF Standard Format:
371
-
372
- The GFF Tool directory allows for the analysis of user-provided annotations in the standard GFF3 format. \
373
- This can be used to compare different cannonical annotations with eachother or additional tools which use the GFF3 format.
374
-
375
- ## Model Based Tools:
376
-
377
- **Augustus - Version 3.3.3** - http://bioinf.uni-greifswald.de/augustus/
378
- This tool has three comparisons with the organism models *E. coli* and *S. aureus* and *H. sapiens*.
379
-
380
- **EasyGene - Version 1.2** - http://www.cbs.dtu.dk/services/EasyGene/
381
- This tool has two comparisons with the organism models *E. coli - K12* and *S. aureus Mu50*.
382
-
383
- **FGENESB - Version '2020'** - http://www.softberry.com/berry.phtml?topic=fgenesb&group=programs&subgroup=gfindb
384
- This tool has two comparisons with the organism models *E. coli - K12* and *S. aureus MU50*.
385
-
386
- **GeneMark - Version 2.5** - http://exon.gatech.edu/GeneMark/gm.cgi
387
- This tool has two comparisons with the organism models *E. coli - K12 - MG165* and *S. aureus Mu50*.
388
-
389
- **GeneMark.hmm - Version 3.2.5** - http://exon.gatech.edu/GeneMark/gmhmmp.cgi
390
- This tool has two comparisons with the organism models *E. coli - K12 - MG165* and *S. aureus Mu50*.
391
-
392
- ## Self-Training/Non-Model Based Tools
393
-
394
- **FragGeneScan - Version 1.3.0** - https://omics.informatics.indiana.edu/FragGeneScan/
395
- The 'complete' genome option was selected and GFF was chosen as output type.
396
-
397
- **GeneMark HA - Version 3.25** - http://exon.gatech.edu/GeneMark/heuristic_gmhmmp.cgi
398
- GFF was chosen as output type.
399
-
400
- **GeneMarkS - Version 4.25** - http://exon.gatech.edu/GeneMark/genemarks.cgi
401
- GFF was chosen as output type.
402
-
403
- **GeneMarkS-2 - Version '2020'** - http://exon.gatech.edu/GeneMark/genemarks2.cgi
404
- GFF3 was chosen as output type.
405
-
406
- **GLIMMER-3 - Version 3.02** - http://ccb.jhu.edu/software/glimmer/index.shtml
407
- Default parameters from manual were used.
408
-
409
- **MetaGene - Version 2.24.0** - https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1636498/
410
- Default options were used.
411
-
412
- **MetaGeneAnnotator - Version '2008/8/19'** - http://metagene.nig.ac.jp/
413
- Defaults options were used.
414
-
415
- **MetaGeneMark - Version '2020'** - http://exon.gatech.edu/meta_gmhmmp.cgi
416
- GFF was chosen as output type.
417
-
418
- **Prodigal - Version 2.6.3** - https://github.com/hyattpd/Prodigal
419
- GFF was chosen as output type.
420
-
421
- **TransDecoder - Version 5.5.0** - https://github.com/TransDecoder/TransDecoder/wiki
422
- Defaults options were used.
423
-
424
- **Balrog - Version 2021`** - https://github.com/salzberg-lab/Balrog
425
- Defaults options were used.
426
-
427
-