npsearch 2.1.0 → 2.1.1
Sign up to get free protection for your applications and to get access to all the features.
- checksums.yaml +4 -4
- data/README.md +41 -28
- data/bin/npsearch +5 -5
- data/exemplar_data/README.md +0 -0
- data/exemplar_data/genetic_data.fa +465 -0
- data/lib/npsearch.rb +2 -2
- data/lib/npsearch/arg_validator.rb +4 -1
- data/lib/npsearch/output.rb +2 -2
- data/lib/npsearch/signalp.rb +24 -21
- data/lib/npsearch/version.rb +1 -1
- data/templates/contents.slim +8 -9
- metadata +4 -2
checksums.yaml
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
---
|
2
2
|
SHA1:
|
3
|
-
metadata.gz:
|
4
|
-
data.tar.gz:
|
3
|
+
metadata.gz: 0e02888758654087d6af73b5fc87bba5363a0b4c
|
4
|
+
data.tar.gz: 2fd35312b2b18d3dfefe99686397dfd6bd7d5cfb
|
5
5
|
SHA512:
|
6
|
-
metadata.gz:
|
7
|
-
data.tar.gz:
|
6
|
+
metadata.gz: 3683325fc2081158d10ab07164e19d8dff0fb16d1031b592c9fdbd6221f13b1877f5f959d78d06acb6978cb351f1e5f96a3d53a9758af16f1f4d6c04a283019c
|
7
|
+
data.tar.gz: c9581cbd2ec7b00b22931fc0957e69bdb8cb525981e106ab759ee876e0c9de673d89bcc1156f5792871ec9c0c13357ac8a7ef0a68326c221d7e9873d5608f4fd
|
data/README.md
CHANGED
@@ -3,24 +3,34 @@
|
|
3
3
|
[![Gem Version](https://badge.fury.io/rb/npsearch.svg)](http://badge.fury.io/rb/npsearch)
|
4
4
|
[![Dependency Status](https://gemnasium.com/wurmlab/NpSearch.svg)](https://gemnasium.com/wurmlab/NpSearch)
|
5
5
|
|
6
|
-
|
6
|
+
<strong>Please note this currently in beta. We are currently working on something that is amazingly fast (i.e. a few seconds to run) and a lot better in every sense (it even has an easy-to-use clicky, pointy interface). So watch this place.</strong>
|
7
7
|
|
8
8
|
## Introduction
|
9
|
-
NpSearch is a tool that helps identify novel neuropeptides. As such it is not based on homology to existing neuropeptides - rather NpSearch is based on the common characteristics of neuropeptides and their precursors.
|
9
|
+
NpSearch is a tool that helps identify novel neuropeptides. As such it is not based on homology to existing neuropeptides - rather NpSearch is based on the common characteristics of neuropeptides and their precursors. In other words, it is a feature based tool.
|
10
|
+
|
11
|
+
The results produced includes the entire secretome ordered in the likelihood of the sequence encoding a neuropeptide. As such, it is expected that you only need to analyse the top half of the results.
|
12
|
+
|
13
|
+
Importantly, NpSearch produces a highly visual html file where the signal peptide and potential cleavage sites are highlighted. Additionally, NpSearch produces a fasta file of the results (i.e. the ordered secretome) that can easily be used in your own pipelines.
|
10
14
|
|
11
15
|
If you use this program, please cite us:
|
12
16
|
|
13
17
|
>Moghul I, Rowe M, Priyam A, ELphick M & Wurm Y <em>(in prep)</em> NpSearch: A Tool to Identify Novel Neuropeptides
|
14
18
|
|
15
|
-
NpSearch
|
19
|
+
NpSearch requires an input of a transcriptomic or predicted proteomic dataset, where each sequence is analysed and awarded a relative score of its likelihood of encoding a neuropeptide precursor. When provided with transcriptomic data, NpSearch translates each contig in all six frames and thereafter extracts all potential open reading frame (methionine to stop codon). Each predicted protein sequence is then analysed for the following neuropeptide-related characteristics:
|
20
|
+
|
21
|
+
**Signal peptide**: All neuropeptide precursors must have a signal peptide. This is due to the fact that the final bioactive neuropeptide has to be secreted from the cell of synthesis in order to be functionally active.
|
22
|
+
|
23
|
+
**Cleavage sites**: Being derived from a precursor, the bioactive neuropeptide has to be cleaved out from the precursor. Prohormone convertase enzymes cleave these bioactive peptides at specific cleavage sites. As certain cleavage motifs are more likely to be cleaved than other cleavage motifs, NpSearch awards sequences based on the type and number of cleavage sites present.
|
24
|
+
|
25
|
+
**C-terminal Glycine**: A significant number of bioactive neuropeptides have a C-terminal glycine that is amidated during post-translation modification. Thus such sequences are awarded with a higher score.
|
26
|
+
|
27
|
+
**Repeated peptides**: Numerous neuropeptide precursors are made up of multiple copies of the same neuropeptide. NpSearch attempts to clustering all potential cleaved neuropeptides, and then awarding sequences that produce larger clusters with a higher score.
|
16
28
|
|
17
|
-
|
29
|
+
**Acidic spacer regions**: Neuropeptide precursors that contain multiple neuropeptide copies tend to have highly acidic regions that separate these copies. If detected by NpSearch, the sequence is awarded with a higher score.
|
30
|
+
|
31
|
+
|
32
|
+
After analysing each sequence in the input dataset, NpSearch produces a visual html file and a fasta file, where sequences that are more likely to encode a neuropeptides precursor are placed at the top of the file. These results files can then be easily inspected and curated by researchers.
|
18
33
|
|
19
|
-
- **Signal peptide**: All neuropeptide precursors must have a signal peptide. This is due to the fact that the final bioactive neuropeptide has to be secreted from the cell of synthesis in order to be functionally active.
|
20
|
-
- **Cleavage sites**: Being derived from a precursor, the bioactive neuropeptide has to be cleaved out from the precursor. Prohormone convertase enzymes cleave these bioactive peptides at specific cleavage sites. Since certain cleavage motifs are more likely to be cleaved, NpSearch awards sequences with cleavage site motifs that are more likely to be cleaved with a higher score.
|
21
|
-
- **C-terminal Glycine**: A significant number of bioactive neuropeptides have a C-terminal glycine, that is amidated during post-translation modification. NpSearch awards sequences that have a potential neuropeptide with a C-terminal glycine a higher score.
|
22
|
-
- **Repeated peptides**: Some neuropeptide precursors contain numerous copies of the same neuropeptides (usually with slight sequence differences). NpSearch attempts to detect this by aligning all potential neuropeptides within a sequence. If a sequence is found to have multiple, similar predicted NPs, NpSearch awards it with a higher score.
|
23
|
-
- **Acidic spacer regions**: Neuropeptide precursors that contain multiple neuropeptide copies tend to have highly acidic spacer regions that separate the NP copies. If detected by NpSearch, the sequence is awarded with a higher score.
|
24
34
|
|
25
35
|
|
26
36
|
|
@@ -31,12 +41,15 @@ NpSearch orders the results based on the following characteristics:
|
|
31
41
|
|
32
42
|
### Installation Requirements
|
33
43
|
* Ruby (>= 2.0.0)
|
34
|
-
* SignalP 4.1 (Available from [here](http://www.cbs.dtu.dk/cgi-bin/nph-sw_request?signalp))
|
44
|
+
* SignalP 4.1.*z (Available from [here](http://www.cbs.dtu.dk/cgi-bin/nph-sw_request?signalp))
|
35
45
|
* CD-HIT (Available from [here](http://weizhongli-lab.org/cd-hit/) - Suggested Installation via [Homebrew](http://brew.sh) or [Linuxbrew](http://linuxbrew.sh) - `brew install homebrew/science/cd-hit`)
|
36
46
|
* EMBOSS (Available from [here](http://emboss.sourceforge.net) - Suggested Installation via [Homebrew](http://brew.sh) or [Linuxbrew](http://linuxbrew.sh) - `brew install homebrew/science/emboss`)
|
37
47
|
|
38
48
|
|
39
49
|
## Installation
|
50
|
+
|
51
|
+
<strong>While in beta, it is suggested that you run NpSearch from source (i.e. the non-recommended method below)</strong>
|
52
|
+
|
40
53
|
Simply run the following command in the terminal.
|
41
54
|
|
42
55
|
```bash
|
@@ -52,7 +65,7 @@ It is also possible to run from source. However, this is not recommended.
|
|
52
65
|
# Clone the repository.
|
53
66
|
git clone https://github.com/wurmlab/npsearch.git
|
54
67
|
|
55
|
-
# Move into NpSearch source directory.
|
68
|
+
# Move into the NpSearch source directory.
|
56
69
|
cd NpSearch
|
57
70
|
|
58
71
|
# Install bundler
|
@@ -86,35 +99,35 @@ npsearch
|
|
86
99
|
You should see the following output.
|
87
100
|
|
88
101
|
```bash
|
89
|
-
*
|
102
|
+
* Description: A tool to identify novel neuropeptides.
|
90
103
|
|
91
|
-
*
|
104
|
+
* Usage: npsearch [Options] [Input File]
|
92
105
|
|
93
|
-
|
94
|
-
|
95
|
-
|
96
|
-
-s, --signalp_path The full path to the signalp script. This can be downloaded from
|
97
|
-
CBS. See https://www.github.com/wurmlab/NpSearch for more
|
98
|
-
information
|
99
|
-
-u, --usearch_path The full path to the usearch binary. This script can be downloaded
|
100
|
-
from .... See https://www.github.com/wurmlab/NpSearch for more
|
106
|
+
* Options
|
107
|
+
-s path_to_signalp, The full path to the SignalP script. This can be downloaded from
|
108
|
+
--signalp_path CBS. See https://www.github.com/wurmlab/NpSearch for more
|
101
109
|
information
|
102
|
-
-
|
103
|
-
|
110
|
+
-d, --temp_dir path_to_temp_dir The full path to the temp dir. NpSearch will create the folder and
|
111
|
+
then delete the folder once it has finished using them.
|
112
|
+
Default: Hidden folder in the current working directory
|
113
|
+
-n, --num_threads num_of_threads The number of threads to use when analysing the input file
|
114
|
+
-l, --min_orf_length N The minimum length of a potential neuropeptide precursor.
|
104
115
|
Default: 30
|
116
|
+
-m, --max_seq_length N The maximum length of a potential neuropeptide precursor.
|
117
|
+
Default: 600
|
105
118
|
-h, --help Display this screen
|
106
119
|
-v, --version Shows version
|
107
|
-
|
108
120
|
```
|
109
121
|
|
110
122
|
|
111
|
-
###
|
123
|
+
### Exemplar Usage Scenario
|
112
124
|
The following runs NpSearch on an input fasta dataset.
|
113
125
|
|
114
126
|
```bash
|
115
|
-
npsearch -
|
127
|
+
npsearch -s /path/to/signalp -n NUM_THREADS INPUT_FASTA_FILE
|
116
128
|
```
|
117
129
|
|
118
|
-
##
|
119
|
-
The output produced by NpSearch is presented in two manners. NpSearch produces a highly visual HTML file that can be open in any browsers (an example can seen [here]()) and a fasta file.
|
130
|
+
## Note
|
120
131
|
|
132
|
+
- With the current version of NpSearch, there is an issue with the number of threads used - it seems to use more threads than that specified in the command line argument
|
133
|
+
- NpSearch is expected to produce a high system load (as shown in `top` / `htop`) - this is because NpSearch runs SignalP as a separate process for each sequence (to speed things up). As such the system load (which is the number of processes called per unit time) can be higher than expected. This is normally not a reason for concern - however, we will probably try and find the middle ground between the speed and the number of processes called (or maybe someone could rewrite SignalP in C with multicore support)...
|
data/bin/npsearch
CHANGED
@@ -26,7 +26,7 @@ Banner
|
|
26
26
|
opts.on('-d', '--temp_dir path_to_temp_dir',
|
27
27
|
'The full path to the temp dir. NpSearch will create the folder and',
|
28
28
|
' then delete the folder once it has finished using them.',
|
29
|
-
' Default: Hidden folder in the current working
|
29
|
+
' Default: Hidden folder in the current working directory') do |p|
|
30
30
|
opt[:temp_dir] = p
|
31
31
|
end
|
32
32
|
|
@@ -37,17 +37,17 @@ Banner
|
|
37
37
|
end
|
38
38
|
|
39
39
|
opt[:min_orf_length] = 30
|
40
|
-
opts.on('-
|
40
|
+
opts.on('-l', '--min_orf_length N', Integer,
|
41
41
|
'The minimum length of a potential neuropeptide precursor.',
|
42
42
|
' Default: 30') do |n|
|
43
43
|
opt[:min_orf_length] = n
|
44
44
|
end
|
45
45
|
|
46
|
-
opt[:
|
47
|
-
opts.on('-m', '--
|
46
|
+
opt[:max_orf_length] = 600
|
47
|
+
opts.on('-m', '--max_orf_length N', Integer,
|
48
48
|
'The maximum length of a potential neuropeptide precursor.',
|
49
49
|
' Default: 600') do |n|
|
50
|
-
opt[:
|
50
|
+
opt[:max_orf_length] = n
|
51
51
|
end
|
52
52
|
|
53
53
|
opts.on('-h', '--help', 'Display this screen') do
|
File without changes
|
@@ -0,0 +1,465 @@
|
|
1
|
+
>isotig00001 gene=isogroup00003 length=2185 numContigs=5
|
2
|
+
TAGCTGTGATCTAGTGGATCTGACTGGCCTTTTGATTATTTCAGCacGATTCTCAGACTA
|
3
|
+
CAGTTGTAAaCCTACTTCGACTACTACTACTActagtacTAACGGTGCAACGTTGTTATA
|
4
|
+
AGTTTGCCAAAGGTGAAACTTTAGCCTTAGGACtGTGTTTATTTTATTTGCAGTCGCATT
|
5
|
+
CgCCTAACTGTTTTCTGTTACTGGGTGCATTTAACTCACATTAATAGAGGATTTTtGACT
|
6
|
+
AGTtCcTAGAGAGTGGTGTTTCTGTTTTACCACCATGGCAAAAAAGGGAAaGCCTCGCCC
|
7
|
+
TGACCATAGGCCTCCTGCACACAACCCGCATTATGCTCATGATCCACCACCTTATTCACA
|
8
|
+
ACAGCAACCACCACTTCAACAGCAGAACTATGCACAACAAATGCATCATGGTGGAGGTGG
|
9
|
+
TGGAAATAGACAACATGCACGACcTAGACCTAGTCCACCTTCAGAAGTCAGTGACTGTGT
|
10
|
+
CAAGTACTCCCTTTTCTtGTATAACTGCATCTTTTGGGTAAGTATGCATTCCTCATGACT
|
11
|
+
GTTATGTATATGTACGTATTTTAGGTCATCCTGCAAGCAGGAaCTCGCGAAGAAGCcTCA
|
12
|
+
TtGGCTTATcAAAGCcGCAAGCTGACCGAAGTCAGTcTcTtAGTTTCATATTtAACGTCC
|
13
|
+
ATGATTATGAaTTgTCTATTCTCAACAACTcTGTAACTGGATGACATACATTAATCTTGG
|
14
|
+
AGTGACTCGAACAGGGGACCTTATGATTGGAAGGCACCGGCCTTAACTTAACCACTGAGC
|
15
|
+
TAACACTCCACATCTTTCAAATTATGTATATAATATATCTTTCAAGATATCTTTCAAATT
|
16
|
+
ATACTGATTTGTCTAGTAAGTACAGTACTGTATCACAAACAGTTCAAAACCGACAAAGTG
|
17
|
+
CTACACAAACGCAAAGGTTTAAGGTATGGTAGTGTTTGTCTGATGGTATACCTTATCTTT
|
18
|
+
TTGGTGATAAGAGCAAAAATGTTCTTTTAATGGTTAAAGTGTAAAGAGGATGTCTTTGTT
|
19
|
+
TTtCTGTgAAGTTTAGTTGTAACTTTCAGATACAaGaAAAaGTGAAATGTGCAATGTACT
|
20
|
+
GTAAGCTCTCAGAGTTACTCAGTCCTTTAGTTtGCtCTGTGAGATATATGCTGTGAGATA
|
21
|
+
TGcTtCAACAGTTCAATTTTCTAACTAAAATTTACATTGGTCATGCAATTTCTTTGTTCG
|
22
|
+
TTTGGTTTCTTGTTTTGTTGGTTAGGTTTTGGTGCTTTAAATTACGATGAGGATATATAA
|
23
|
+
CAGAGTGTGTTTTCaAACAGCTGGCTGTTATCTGCAGAATCTGGTCACAaCAAGTATACA
|
24
|
+
ACCCGCcCGCGTATGGACATATTAATATACCTTTCTCTCATGTGCACTAGAGTTTTTCAT
|
25
|
+
TTAGTTACCAAAAAAATCAGTTCTGTGACACATTTTTAGGTTAAAGGTTCAAGGTTGGAG
|
26
|
+
AATCCAATAATCATTATACGGTGTGAAGACTCGCGCAAAAAGAACGGCtATGCCgTAATC
|
27
|
+
TGACCTaGTTTCGAATGAGGTGTAACAGAAGTGTTAGACACCACCATCGATCCCAGAAAA
|
28
|
+
TACACACACAGCTTGCTACCgTCGGTAATTAGACACTAGTGTACAGTCAgTACATACAGC
|
29
|
+
TGCAGTCAACACCCACAGCACAGTGTACAAACGGTACAGCGATGGACATCTCAGGTCCAG
|
30
|
+
CTAAAGATAACAATGTATCGCGTTTCATTACTGTCTGCATTTTGTAGCGACACGAACAAA
|
31
|
+
ACGTCACTTGCAAGCAACAGAAAGTTAACTTTTTCATATGGCTGCATGCGGTTTGGGgCG
|
32
|
+
AGTCTTCAGTGCCTTTAAAGTAGATGAAATGGATTGATCTTGAGGAGAAATGCCATCAGG
|
33
|
+
tTtCGTTGGCAAACGttCAGGATTTTGTCAGTTTTGCTGTAGTCACATTTAGCAAGATGA
|
34
|
+
CGACACAGAAAATATGACGTATAGTACTGCAAAGGAAGGAGCTTATccttTtcGTAATTT
|
35
|
+
taattGATtaaGGTtTCAATGCaaGCTTCCATACAGCTTTCAACAGCACATTCAGTTTAA
|
36
|
+
AGCAGTATATATGTGAGAACAAAAGGGGTTTTCCCAAAATATTGGgTACCcAAATgggTC
|
37
|
+
ACAGCAGACCATAGCAAaCTTTATAAGTGcGCATCttttGACACATATTGAagTGCATAA
|
38
|
+
TTtttCTAATAAATTCTTTaaaata
|
39
|
+
>isotig00002 gene=isogroup00003 length=1914 numContigs=5
|
40
|
+
TGAATGAGAAAtGAAATTTAGCGAAGAAATCACCTTGTAAATTAAAAACTAAAATGGCTT
|
41
|
+
TCACACAAATTAaCAGTAAAtGgAGAATGTTTTTAAAGCAATATATGCAGTACAGCcATT
|
42
|
+
CATTGGAAAACAGTAAcAAAaTACATTTATCTTGTtcATTTTtACctCctGCAAaacTTA
|
43
|
+
cAaCcGTTAATTATGTAGATTGGATGGCACTAACAGGGTACTTGTCTTATCTGCCTATTG
|
44
|
+
GATAATGTGGcATTAATACTACTGTGTATGGGCACTGAGGCTGAGAGTGCAGTAAGTTtA
|
45
|
+
AAGGCATTGAAGACTCtCCCCGAaCcGCGtGCCGGGCTctGAAAAAGTtAaCTGCTCGCA
|
46
|
+
AaTtAcGTTTtCTtCTTGTCaCTaCAAAaTGCAGACATTaaTGAAACGTGATACCTTGTt
|
47
|
+
ATCTTTTATCTAGACCTGAGATGTCcAtCGCTGCTATgTACAcTGTGTTGTGGGTATTGA
|
48
|
+
CcgTAGCTGTATGTATtGACTGTACACTAGTGTCTAATtACCGACGGTAGCAAGCTGTGT
|
49
|
+
GTGTATTTTCTGGGATCaaTGGTGgTGTCTAACACTTCtGTTACACCtCAtTcGAAACTA
|
50
|
+
GGTCAGAtTAcCgGCATAGACGTTTCTTTGTGCcgAGTCtTCACAcccttttaaggagaa
|
51
|
+
gtattttatatcccattaaTAGAGAAAGAATTATGTTTCAATTGGTAGCATGCAACTTAA
|
52
|
+
AAtTTTGCAAACaTTAGATACAATCAAACAATAACAAGGTTCATTACAAGGttAAtCtac
|
53
|
+
TGCCCTTTATTtCACATTaGTCTGTCACATCAGAAGgTCACAGCTTtCAaTAaTTATACA
|
54
|
+
AACAAATtCCCTtGATGGgTGCTtGgTtAGATTCCTGCaatTTTCAAGTTTATCAATGTA
|
55
|
+
ATAAGTTCTGAATGTGGCAATGTGGaaGAAGCtTtGGGATAATCTGTGGATAAGACTGCC
|
56
|
+
AGACTATCAACAAGATTCCACATCCATGCAACTCCAACTGCTTCCTTCACCTCCTTGCAG
|
57
|
+
ATGTACCACATGTCCaaAAAAAAGTCAGCATTGGTTAATGTAATTAAAATCTGGCTTCCT
|
58
|
+
cCTGAGCTGGCAAATACACATGAATTGTCAAGTACAGAGGTCAGTGTGAAACCACTGAAA
|
59
|
+
GATCTTCTCAGCTTTCAAGAAAACAAAGACTTGAAGCCAAATTGACAGAGGCCACACTGA
|
60
|
+
TACCATTCCACTCTCATAAGATGAAGGTATCACACACACTTCATTTTGCTTCTGCGATGC
|
61
|
+
AGTGCCTGGTAGACTGTGAGGgTCACCCAATGGATgtTTTAaCAaCTGCCtGGTTtAtAG
|
62
|
+
AGCTCTGCAACAGATAATTCATCCTAaTGTCTAGTCGTCATCCTGTCATGGCCTTGAGCA
|
63
|
+
AGTTGAACCCACTTCAACACAAAGCAGCTATTGAATTCTTGTCTAGGTACTGTcAAATCC
|
64
|
+
ACATCACCATCATTGCttGGTTCCAGCTaCGcTGACCATGaTAAAAGAGTACAATGAGGG
|
65
|
+
TTTTTTAATTCACCCAACAGAGCTTGCATTCCAGTACCTTTGGGCAGCTGaaaaGATATT
|
66
|
+
CAGAAaTTGTTATATATGAGTGTGTTTGTATGCATGCAtATGtGTGATTTtCTtGCTTTA
|
67
|
+
CAGAACAGCTCCaTTTTGATAAGCTAtgTAAcgtGgAAACCTGCCAATCAaTGTTtgAAa
|
68
|
+
taGGAcaGgCTGAAACGATTCTTAAATGAAAAGCTTAAtgaCTTcTTgCAtttttaTACA
|
69
|
+
TCACTGTTCAGGtAaGGCCAGTAAGGgCAGTATgAaGAAtAaGTAACAATtAATAATTAT
|
70
|
+
CATTATGGCCATTTGCTGtcTGCATAAtAaCAAACTGAATGATGTCATCAGCCCTgTGCT
|
71
|
+
CAGTTGACAgAACTGACAAGTAGGCACACaaTGTCAGTGTGATCCATGAAACCT
|
72
|
+
>isotig00003 gene=isogroup00003 length=1917 numContigs=7
|
73
|
+
TAGCTGTGATCTAGTGGATCTGACTGGCCTTTTGATTATTTCAGCacGATTCTCAGACTA
|
74
|
+
CAGTTGTAAaCCTACTTCGACTACTACTACTActagtacTAACGGTGCAACGTTGTTATA
|
75
|
+
AGTTTGCCAAAGGTGAAACTTTAGCCTTAGGACtGTGTTTATTTTATTTGCAGTCGCATT
|
76
|
+
CgCCTAACTGTTTTCTGTTACTGGGTGCATTTAACTCACATTAATAGAGGATTTTtGACT
|
77
|
+
AGTtCcTAGAGAGTGGTGTTTCTGTTTTACCACCATGGCAAAAAAGGGAAaGCCTCGCCC
|
78
|
+
TGACCATAGGCCTCCTGCACACAACCCGCATTATGCTCATGATCCACCACCTTATTCACA
|
79
|
+
ACAGCAACCACCACTTCAACAGCAGAACTATGCACAACAAATGCATCATGGTGGAGGTGG
|
80
|
+
TGGAAATAGACAACATGCACGACcTAGACCTAGTCCACCTTCAGAAGTCAGTGACTGTGT
|
81
|
+
CAAGTACTCCCTTTTCTtGTATAACTGCATCTTTTGGGTAAGTATGCATTCCTCATGACT
|
82
|
+
GTTATGTATATGTACGTATTTTAGGTCATCCTGCAAGCAGGAaCTCGCGAAGAAGCcTCA
|
83
|
+
TtGGCTTATcAAAGCcGCAAGCTGACCGAAGTCAGTcTcTtAGTTTCATATTtAACGTCC
|
84
|
+
ATGATTATGAaTTgTCTATTCTCAACAACTcTGTAACTGGATGACATACATTAATCTTGG
|
85
|
+
AGTGACTCGAACAGGGGACCTTATGATTGGAAGGCACCGGCCTTAACTTAACCACTGAGC
|
86
|
+
TAACACTCCACATCTTTCAAATTATGTATATAATATATCTTTCAAGATATCTTTCAAATT
|
87
|
+
ATACTGATTTGTCTAGTAAGTACAGTACTGTATCACAAACAGTTCAAAACCGACAAAGTG
|
88
|
+
CTACACAAACGCAAAGGTTTAAGGTATGGTAGTGTTTGTCTGATGGTATACCTTATCTTT
|
89
|
+
TTGGTGATAAGAGCAAAAATGTTCTTTTAATGGTTAAAGTGTAAAGAGGATGTCTTTGTT
|
90
|
+
TTtCTGTgAAGTTTAGTTGTAACTTTCAGATACAaGaAAAaGTGAAATGTGCAATGTACT
|
91
|
+
GTAAGCTCTCAGAGTTACTCAGTCCTTTAGTTtGCtCTGTGAGATATATGCTGTGAGATA
|
92
|
+
TGcTtCAACAGTTCAATTTTCTAACTAAAATTTACATTGGTCATGCAATTTCTTTGTTCG
|
93
|
+
TTTGGTTTCTTGTTTTGTTGGTTAGGTTTTGGTGCTTTAAATTACGATGAGGATATATAA
|
94
|
+
CAGAGTGTGTTTTCaAACAGCTGGCTGTTATCTGCAGAATCTGGTCACAaCAAGTATACA
|
95
|
+
ACCCGCcCGCGTATGGACATATTAATATACCTTTCTCTCATGTGCACTAGAGTTTTTCAT
|
96
|
+
TTAGTTACCAAAAAAATCAGTTCTGTGACACATTTTTAGGTTAAAGGTTCAAGGTTGGAG
|
97
|
+
AATCCAATAATCATTATACGGTGTGAAGACTCGCGCAAAAAGAACGGCtATGCCgTAATC
|
98
|
+
TGACCTaGTTTCGAATGAGGTGTAACAGAAGTGTTAGACACCACCATCGATCCCAGAAAA
|
99
|
+
TACACACACAGCTTGCTACCgTCGGTAATTAGACACTAGTGTACAGTCAgTACATACAGC
|
100
|
+
TaCGGTCAATACCCAcaaaaCaGTGtACaTAGCAGCGaTGGACATcTCAGGTCCAGATAA
|
101
|
+
AGATAACAAGGTATCACGTTTCATTACTGTCTGCaTTTTGTAGCgACAaGAAGAAAACTt
|
102
|
+
CACTtGCAAGCAACGgAAAGTTAACTTTTtCAGAGCGCGGCACGCGGGTTGGGGCAAGTC
|
103
|
+
TTCCAAGCCTTTAAGTtGACAtcTTGCCTTTGGCTATCCAGGgTGACAAGATGATACTAG
|
104
|
+
CAGGTAgagtgactaattgagccctgtgtgagaaaccaatgcagaatctagcctagt
|
105
|
+
>isotig00004 gene=isogroup00003 length=1896 numContigs=6
|
106
|
+
TAGCTGTGATCTAGTGGATCTGACTGGCCTTTTGATTATTTCAGCacGATTCTCAGACTA
|
107
|
+
CAGTTGTAAaCCTACTTCGACTACTACTACTActagtacTAACGGTGCAACGTTGTTATA
|
108
|
+
AGTTTGCCAAAGGTGAAACTTTAGCCTTAGGACtGTGTTTATTTTATTTGCAGTCGCATT
|
109
|
+
CgCCTAACTGTTTTCTGTTACTGGGTGCATTTAACTCACATTAATAGAGGATTTTtGACT
|
110
|
+
AGTtCcTAGAGAGTGGTGTTTCTGTTTTACCACCATGGCAAAAAAGGGAAaGCCTCGCCC
|
111
|
+
TGACCATAGGCCTCCTGCACACAACCCGCATTATGCTCATGATCCACCACCTTATTCACA
|
112
|
+
ACAGCAACCACCACTTCAACAGCAGAACTATGCACAACAAATGCATCATGGTGGAGGTGG
|
113
|
+
TGGAAATAGACAACATGCACGACcTAGACCTAGTCCACCTTCAGAAGTCAGTGACTGTGT
|
114
|
+
CAAGTACTCCCTTTTCTtGTATAACTGCATCTTTTGGGTAAGTATGCATTCCTCATGACT
|
115
|
+
GTTATGTATATGTACGTATTTTAGGTCATCCTGCAAGCAGGAaCTCGCGAAGAAGCcTCA
|
116
|
+
TtGGCTTATcAAAGCcGCAAGCTGACCGAAGTCAGTcTcTtAGTTTCATATTtAACGTCC
|
117
|
+
ATGATTATGAaTTgTCTATTCTCAACAACTcTGTAACTGGATGACATACATTAATCTTGG
|
118
|
+
AGTGACTCGAACAGGGGACCTTATGATTGGAAGGCACCGGCCTTAACTTAACCACTGAGC
|
119
|
+
TAACACTCCACATCTTTCAAATTATGTATATAATATATCTTTCAAGATATCTTTCAAATT
|
120
|
+
ATACTGATTTGTCTAGTAAGTACAGTACTGTATCACAAACAGTTCAAAACCGACAAAGTG
|
121
|
+
CTACACAAACGCAAAGGTTTAAGGTATGGTAGTGTTTGTCTGATGGTATACCTTATCTTT
|
122
|
+
TTGGTGATAAGAGCAAAAATGTTCTTTTAATGGTTAAAGTGTAAAGAGGATGTCTTTGTT
|
123
|
+
TTtCTGTgAAGTTTAGTTGTAACTTTCAGATACAaGaAAAaGTGAAATGTGCAATGTACT
|
124
|
+
GTAAGCTCTCAGAGTTACTCAGTCCTTTAGTTtGCtCTGTGAGATATATGCTGTGAGATA
|
125
|
+
TGcTtCAACAGTTCAATTTTCTAACTAAAATTTACATTGGTCATGCAATTTCTTTGTTCG
|
126
|
+
TTTGGTTTCTTGTTTTGTTGGTTAGGTTTTGGTGCTTTAAATTACGATGAGGATATATAA
|
127
|
+
CAGAGTGTGTTTTCaAACAGCTGGCTGTTATCTGCAGAATCTGGTCACAaCAAGTATACA
|
128
|
+
ACCCGCcCGCGTATGGACATATTAATATACCTTTCTCTCATGTGCACTAGAGTTTTTCAT
|
129
|
+
TTAGTTACCAAAAAAATCAGTTCTGTGACACATTTTTAGGTTAAAGGTTCAAGGTTGGAG
|
130
|
+
AATCCAATAATCATTATACGGTGTGAAGACTCGCGCAAAAAGAACGGCtATGCCgTAATC
|
131
|
+
TGACCTaGTTTCGAATGAGGTGTAACAGAAGTGTTAGACACCACCATCGATCCCAGAAAA
|
132
|
+
TACACACACAGCTTGCTACCgTCGGTAATTAGACACTAGTGTACAGTCAgTACATACAGC
|
133
|
+
TaCGGTCAATACCCAcaaaaCaGTGtACaTAGCAGCGaTGGACATcTCAGGTCCAGATAA
|
134
|
+
AGATAACAAGGTATCACGTTTCATTACTGTCTGCaTTTTGTAGCgACAaGAAGAAAACTt
|
135
|
+
CACTtGCAAGCAACGgAAAGTTAACTTTTtCAGAGGGCAGCACTTGGTTTGGAGCGAATC
|
136
|
+
TTCAATGCCTTTAAGTCATCCTTTACTAGATGGAAGCTCTTCTTATGTAGTTTACTCttc
|
137
|
+
ATACTATCAAGACATTCTTAATGATATACTATGCTT
|
138
|
+
>isotig00005 gene=isogroup00003 length=1789 numContigs=6
|
139
|
+
ACATTCTTCAAGAGCTCTGCACCCACCAATCTAAAGTGACCAGCCAAGTGACTGACCTCA
|
140
|
+
GGGCACAGTTAGCAGCTTTGACCACAGGATGAGCTATGTAACAACTGAAtgaaTGGTGTT
|
141
|
+
CAtcGTTGATTGGGCAgTCAAAACAGCTGAATTTCTCTTGCGgAAGACATAAAGGCATTG
|
142
|
+
AAGACtcGCCcAAaccGtGTGcgcccTCTGAAAAaGTTAACTTTctGTTgCTTGCAaGTG
|
143
|
+
AAGTTTtcTtCTtGTCgCTACAAAATGCAGACAGTAaTgAAACGTGATACcTtGTtATCT
|
144
|
+
TTtATCTAgACctGAGATGtCcACGCTGCTATGTACACTGTGTTGTGGgTATTGACcGTA
|
145
|
+
GCTGTATGTATtGACTGTACACTAGTGTCTAATtACCGACGGTAGCAAGCTGTGTGTGTA
|
146
|
+
TTTTCTGGGATCaaTGGTGgTGTCTAACACTTCtGTTACACCtCAtTcGAAACTAGGTCA
|
147
|
+
GAtTAcCgGCATAGACGTTTCTTTGTGCcgAGTCtTCACAcccttttaaggagaagtatt
|
148
|
+
ttatatcccattaaTAGAGAAAGAATTATGTTTCAATTGGTAGCATGCAACTTAAAAtTT
|
149
|
+
TGCAAACaTTAGATACAATCAAACAATAACAAGGTTCATTACAAGGttAAtCtacTGCCC
|
150
|
+
TTTATTtCACATTaGTCTGTCACATCAGAAGgTCACAGCTTtCAaTAaTTATACAAACAA
|
151
|
+
ATtCCCTtGATGGgTGCTtGgTtAGATTCCTGCaatTTTCAAGTTTATCAATGTAATAAG
|
152
|
+
TTCTGAATGTGGCAATGTGGaaGAAGCtTtGGGATAATCTGTGGATAAGACTGCCAGACT
|
153
|
+
ATCAACAAGATTCCACATCCATGCAACTCCAACTGCTTCCTTCACCTCCTTGCAGATGTA
|
154
|
+
CCACATGTCCaaAAAAAAGTCAGCATTGGTTAATGTAATTAAAATCTGGCTTCCTcCTGA
|
155
|
+
GCTGGCAAATACACATGAATTGTCAAGTACAGAGGTCAGTGTGAAACCACTGAAAGATCT
|
156
|
+
TCTCAGCTTTCAAGAAAACAAAGACTTGAAGCCAAATTGACAGAGGCCACACTGATACCA
|
157
|
+
TTCCACTCTCATAAGATGAAGGTATCACACACACTTCATTTTGCTTCTGCGATGCAGTGC
|
158
|
+
CTGGTAGACTGTGAGGgTCACCCAATGGATgtTTTAaCAaCTGCCtGGTTtAtAGAGCTC
|
159
|
+
TGCAACAGATAATTCATCCTAaTGTCTAGTCGTCATCCTGTCATGGCCTTGAGCAAGTTG
|
160
|
+
AACCCACTTCAACACAAAGCAGCTATTGAATTCTTGTCTAGGTACTGTcAAATCCACATC
|
161
|
+
ACCATCATTGCttGGTTCCAGCTaCGcTGACCATGaTAAAAGAGTACAATGAGGGTTTTT
|
162
|
+
TAATTCACCCAACAGAGCTTGCATTCCAGTACCTTTGGGCAGCTGaaaaGATATTCAGAA
|
163
|
+
aTTGTTATATATGAGTGTGTTTGTATGCATGCAtATGtGTGATTTtCTtGCTTTACAGAA
|
164
|
+
CAGCTCCaTTTTGATAAGCTAtgTAAcgtGgAAACCTGCCAATCAaTGTTtgAAataGGA
|
165
|
+
caGgCTGAAACGATTCTTAAATGAAAAGCTTAAtgaCTTcTTgCAtttttaTACATCACT
|
166
|
+
GTTCAGGtAaGGCCAGTAAGGgCAGTATgAaGAAtAaGTAACAATtAATAATTATCATTA
|
167
|
+
TGGCCATTTGCTGtcTGCATAAtAaCAAACTGAATGATGTCATCAGCCCTgTGCTCAGTT
|
168
|
+
GACAgAACTGACAAGTAGGCACACaaTGTCAGTGTGATCCATGAAACCT
|
169
|
+
>isotig00006 gene=isogroup00003 length=1747 numContigs=6
|
170
|
+
AGTTAAAAGTTGAAAAATTGGTGACCATATTTTGACACTCTAGCATATTTGGGAGCTATA
|
171
|
+
TACTGATTTGGGTTTCACCATGCACAGATGAGGTATATACATAAGTTGAAAGCCTGCAGC
|
172
|
+
TCTATATTAAAGGCATTGAAGACtcGCCcAAaccgtgTGcgcccTCTGAAAAaGTTAACT
|
173
|
+
TTCcGTTgCTTGCAaGTGAAGTTTtcTtCTTGTCGCTACAAAATGCAGACAGTAATGAAA
|
174
|
+
CGTGATACcTtGTtATCTTTtATCTAgACcTGAGATGtCcACGCTGCTATGTACACTGTG
|
175
|
+
TTGTGGgTATTGACcGTAGCTGTATGTATtGACTGTACACTAGTGTCTAATtACCGACGG
|
176
|
+
TAGCAAGCTGTGTGTGTATTTTCTGGGATCaaTGGTGgTGTCTAACACTTCtGTTACACC
|
177
|
+
tCAtTcGAAACTAGGTCAGAtTAcCgGCATAGACGTTTCTTTGTGCcgAGTCtTCACAcc
|
178
|
+
cttttaaggagaagtattttatatcccattaaTAGAGAAAGAATTATGTTTCAATTGGTA
|
179
|
+
GCATGCAACTTAAAAtTTTGCAAACaTTAGATACAATCAAACAATAACAAGGTTCATTAC
|
180
|
+
AAGGttAAtCtacTGCCCTTTATTtCACATTaGTCTGTCACATCAGAAGgTCACAGCTTt
|
181
|
+
CAaTAaTTATACAAACAAATtCCCTtGATGGgTGCTtGgTtAGATTCCTGCaatTTTCAA
|
182
|
+
GTTTATCAATGTAATAAGTTCTGAATGTGGCAATGTGGaaGAAGCtTtGGGATAATCTGT
|
183
|
+
GGATAAGACTGCCAGACTATCAACAAGATTCCACATCCATGCAACTCCAACTGCTTCCTT
|
184
|
+
CACCTCCTTGCAGATGTACCACATGTCCaaAAAAAAGTCAGCATTGGTTAATGTAATTAA
|
185
|
+
AATCTGGCTTCCTcCTGAGCTGGCAAATACACATGAATTGTCAAGTACAGAGGTCAGTGT
|
186
|
+
GAAACCACTGAAAGATCTTCTCAGCTTTCAAGAAAACAAAGACTTGAAGCCAAATTGACA
|
187
|
+
GAGGCCACACTGATACCATTCCACTCTCATAAGATGAAGGTATCACACACACTTCATTTT
|
188
|
+
GCTTCTGCGATGCAGTGCCTGGTAGACTGTGAGGgTCACCCAATGGATgtTTTAaCAaCT
|
189
|
+
GCCtGGTTtAtAGAGCTCTGCAACAGATAATTCATCCTAaTGTCTAGTCGTCATCCTGTC
|
190
|
+
ATGGCCTTGAGCAAGTTGAACCCACTTCAACACAAAGCAGCTATTGAATTCTTGTCTAGG
|
191
|
+
TACTGTcAAATCCACATCACCATCATTGCttGGTTCCAGCTaCGcTGACCATGaTAAAAG
|
192
|
+
AGTACAATGAGGGTTTTTTAATTCACCCAACAGAGCTTGCATTCCAGTACCTTTGGGCAG
|
193
|
+
CTGaaaaGATATTCAGAAaTTGTTATATATGAGTGTGTTTGTATGCATGCAtATGtGTGA
|
194
|
+
TTTtCTtGCTTTACAGAACAGCTCCaTTTTGATAAGCTAtgTAAcgtGgAAACCTGCCAA
|
195
|
+
TCAaTGTTtgAAataGGAcaGgCTGAAACGATTCTTAAATGAAAAGCTTAAtgaCTTcTT
|
196
|
+
gCAtttttaTACATCACTGTTCAGGtAaGGCCAGTAAGGgCAGTATgAaGAAtAaGTAAC
|
197
|
+
AATtAATAATTATCATTATGGCCATTTGCTGtcTGCATAAtAaCAAACTGAATGATGTCA
|
198
|
+
TCAGCCCTgTGCTCAGTTGACAgAACTGACAAGTAGGCACACaaTGTCAGTGTGATCCAT
|
199
|
+
GAAACCT
|
200
|
+
>isotig00007 gene=isogroup00003 length=1749 numContigs=5
|
201
|
+
TGTGTGTGTGTGGTGCTTCCccTCTAGGGCTGTAAATTTCAAAGGAACCTTGCGCAAGAA
|
202
|
+
CAGtAGCTTGCGaCGTTTTTCAAaaCCAGAGGTTCTGAACTGAACTGTACTGACTACTGT
|
203
|
+
AGGGtacTTAAaGGCATTGAAGACTCGCCcAAaCCatgTGCCGCGctttGAAAAAGTTAA
|
204
|
+
CTTTCCGTTGCTTGCAAATGAcGTTTtcTtCTtGTCgCTACAAAATGCAGACAGTAaTgA
|
205
|
+
AACGTGATACcTtGTtATCTTTtATCTAgACctGAGATGtCcACGCTGCTATGTACACTG
|
206
|
+
TGTTGTGGgTATTGACcGTAGCTGTATGTATtGACTGTACACTAGTGTCTAATtACCGAC
|
207
|
+
GGTAGCAAGCTGTGTGTGTATTTTCTGGGATCaaTGGTGgTGTCTAACACTTCtGTTACA
|
208
|
+
CCtCAtTcGAAACTAGGTCAGAtTAcCgGCATAGACGTTTCTTTGTGCcgAGTCtTCACA
|
209
|
+
cccttttaaggagaagtattttatatcccattaaTAGAGAAAGAATTATGTTTCAATTGG
|
210
|
+
TAGCATGCAACTTAAAAtTTTGCAAACaTTAGATACAATCAAACAATAACAAGGTTCATT
|
211
|
+
ACAAGGttAAtCtacTGCCCTTTATTtCACATTaGTCTGTCACATCAGAAGgTCACAGCT
|
212
|
+
TtCAaTAaTTATACAAACAAATtCCCTtGATGGgTGCTtGgTtAGATTCCTGCaatTTTC
|
213
|
+
AAGTTTATCAATGTAATAAGTTCTGAATGTGGCAATGTGGaaGAAGCtTtGGGATAATCT
|
214
|
+
GTGGATAAGACTGCCAGACTATCAACAAGATTCCACATCCATGCAACTCCAACTGCTTCC
|
215
|
+
TTCACCTCCTTGCAGATGTACCACATGTCCaaAAAAAAGTCAGCATTGGTTAATGTAATT
|
216
|
+
AAAATCTGGCTTCCTcCTGAGCTGGCAAATACACATGAATTGTCAAGTACAGAGGTCAGT
|
217
|
+
GTGAAACCACTGAAAGATCTTCTCAGCTTTCAAGAAAACAAAGACTTGAAGCCAAATTGA
|
218
|
+
CAGAGGCCACACTGATACCATTCCACTCTCATAAGATGAAGGTATCACACACACTTCATT
|
219
|
+
TTGCTTCTGCGATGCAGTGCCTGGTAGACTGTGAGGgTCACCCAATGGATgtTTTAaCAa
|
220
|
+
CTGCCtGGTTtAtAGAGCTCTGCAACAGATAATTCATCCTAaTGTCTAGTCGTCATCCTG
|
221
|
+
TCATGGCCTTGAGCAAGTTGAACCCACTTCAACACAAAGCAGCTATTGAATTCTTGTCTA
|
222
|
+
GGTACTGTcAAATCCACATCACCATCATTGCttGGTTCCAGCTaCGcTGACCATGaTAAA
|
223
|
+
AGAGTACAATGAGGGTTTTTTAATTCACCCAACAGAGCTTGCATTCCAGTACCTTTGGGC
|
224
|
+
AGCTGaaaaGATATTCAGAAaTTGTTATATATGAGTGTGTTTGTATGCATGCAtATGtGT
|
225
|
+
GATTTtCTtGCTTTACAGAACAGCTCCaTTTTGATAAGCTAtgTAAcgtGgAAACCTGCC
|
226
|
+
AATCAaTGTTtgAAataGGAcaGgCTGAAACGATTCTTAAATGAAAAGCTTAAtgaCTTc
|
227
|
+
TTgCAtttttaTACATCACTGTTCAGGtAaGGCCAGTAAGGgCAGTATgAaGAAtAaGTA
|
228
|
+
ACAATtAATAATTATCATTATGGCCATTTGCTGtcTGCATAAtAaCAAACTGAATGATGT
|
229
|
+
CATCAGCCCTgTGCTCAGTTGACAgAACTGACAAGTAGGCACACaaTGTCAGTGTGATCC
|
230
|
+
ATGAAACCT
|
231
|
+
>isotig00008 gene=isogroup00003 length=1726 numContigs=6
|
232
|
+
AGGTTTCATGGATCACACTGACAtTGTGTGCCTACTTGTCAGTTcTGTCAACTGAGCAcA
|
233
|
+
GGGCTGATGACATCATTCAGTTTGttattATGCAggaCAGCAAATGGCCATAATGATAAT
|
234
|
+
TATTAaTTGTTACTtaTTCTtcATACTGCCcTTACTGGCCTtaCCTGAACAGTGATGTAt
|
235
|
+
caaaaTGcAAgAAGtcaTTAAGCTTTTCATTTAAGAATCGTTTCAGCctgTCCtaatTTt
|
236
|
+
cAAaCAtTGATTGGCAGGTTTCcacgTTAcaTAGCTTATCAAAAtGGAGCTGTTCTGTAA
|
237
|
+
AGCAAGaAAATCACaCATaTGCATGCATACAAACACACTCATATATAACAAtTTCTGAAT
|
238
|
+
ATCTtttCAGCTGCCCAAAGGTACTGGAATGCAAGCTCTGTTGGGTGAATTAAAAAaCCc
|
239
|
+
TCATTGTACTCTTTTATCATGGTCAGCGTAGCTGGAACCAGCAATGATGGTGATGTGGAT
|
240
|
+
TTGACAGTACCTAGACAAGAATTCAATAGCTGCTTTGTGTTGAAGTGGGTTCAACTTGCT
|
241
|
+
CAAGGCCATGACAGGATGACGACTAGACATtAGGATGAATTATCTGTTGCAGAGCTCTAT
|
242
|
+
AAaCCAGGCAGTtGTtAAAaCATCCATTGGGTGACCcTCACAGTCTACCAGGCACTGCAT
|
243
|
+
CGCAGAAGCAAAATGAAGTGTGTGTgATACCTTCATCTTATGAGAGTGGAATGGTATCAG
|
244
|
+
TGTGGCCTCTGTCAATTTGGCTTCAAGTCTTTGTTTTCTTGAAAGCTGAGAaGATCTTTC
|
245
|
+
AGTGGTTTCACACTGACCTCTGTACTTGACAATTCATGTGTATTTGCCAGCTCAGgAGGA
|
246
|
+
AGCCAGATTTTAATTACATTAACCAATGCTGACTTTTTTttGGACATGTGGTACATCTGC
|
247
|
+
AAGGAGGTGAAGGAAGCAGTTGGAGTTGCATGGATGTGGAATCTTGTTGATAGTCTGGCA
|
248
|
+
GTCTTATCCACAGATTATCCCAAAGCTTCTCCACATTGCCACATTCAGAACTTATTACAT
|
249
|
+
TGATAAACTTGAAAATtGCAGGAATCTAaCcAaGCACCcATCAaGGGAaTTTGTTTGTAT
|
250
|
+
AATtATtGAAaGCTGTGACcTTCTGATGTGACAGACTAATGTGAAaTAAAGGgCAgtaGa
|
251
|
+
TTaCCTTGTaaTGAACCttGTTATTGTTTGATTGTATCTAAtGTTTGCAaaTTTTAAGTT
|
252
|
+
GCATGCTACCAATTGAAACATAATTCTTTCTCTAttaatgggatataaaatacttctcct
|
253
|
+
taaaagggTGTgAaGACTcggCACAAAGAAACGTCtaTGCcGgtAaTCTGACCTAGTTTc
|
254
|
+
gAatGaGGTGTAACagAAGTgTtAGACACcACCAttGATCCcAGAAAATACACACACAGC
|
255
|
+
TTGCTACCGTCGGTAaTTAGACACTAGTGTACAGTCAaTACATACAGCTAcGgTCAATAC
|
256
|
+
CCACAaCACAgTGTAcATAGCAGCGaTGgACATCTCAGGTCTAGATAAAAGATAaCAAGG
|
257
|
+
TATCACGTTTCATtaCTGTCTGCATTTtGTAGCgaCAagAAGAAAAcgtCATTtGCAAGC
|
258
|
+
AaTGgAAAGTtAACTTTTTCaGAGCGcagCAcGCgggTTGGGGCAAGTCTTCCAAGCCTT
|
259
|
+
TAAGTtGACAtcTTGCCTTTGGCTATCCAGGgTGACAAGATGATACTAGCAGGTAgagtg
|
260
|
+
actaattgagccctgtgtgagaaaccaatgcagaatctagcctagt
|
261
|
+
>isotig00009 gene=isogroup00003 length=1827 numContigs=2
|
262
|
+
TAGCTGTGATCTAGTGGATCTGACTGGCCTTTTGATTATTTCAGCacGATTCTCAGACTA
|
263
|
+
CAGTTGTAAaCCTACTTCGACTACTACTACTActagtacTAACGGTGCAACGTTGTTATA
|
264
|
+
AGTTTGCCAAAGGTGAAACTTTAGCCTTAGGACtGTGTTTATTTTATTTGCAGTCGCATT
|
265
|
+
CgCCTAACTGTTTTCTGTTACTGGGTGCATTTAACTCACATTAATAGAGGATTTTtGACT
|
266
|
+
AGTtCcTAGAGAGTGGTGTTTCTGTTTTACCACCATGGCAAAAAAGGGAAaGCCTCGCCC
|
267
|
+
TGACCATAGGCCTCCTGCACACAACCCGCATTATGCTCATGATCCACCACCTTATTCACA
|
268
|
+
ACAGCAACCACCACTTCAACAGCAGAACTATGCACAACAAATGCATCATGGTGGAGGTGG
|
269
|
+
TGGAAATAGACAACATGCACGACcTAGACCTAGTCCACCTTCAGAAGTCAGTGACTGTGT
|
270
|
+
CAAGTACTCCCTTTTCTtGTATAACTGCATCTTTTGgaTTGtCGGCCTTttCTTTATtGC
|
271
|
+
AGCAGGTATCTGGgCATTTCACGATAGGGGTGTTTTTAATGAATTCCAGTCACTTAGTAC
|
272
|
+
CAATGAGGTCTCCTTTCTCACTGATCCTGTTATTTGGCTGTTCGTCCTCGGAGGTGTAGT
|
273
|
+
TTTCATGCTGGGAACCCTCGGATGTCTgGGGgCCCTCAGAGAAAaTATCTGCATGCTGAA
|
274
|
+
GTGTTTTAGCATAATCATGGGGCTTATACTGCTGCTGGAAATTGGAGGTGGATGTGCGAT
|
275
|
+
ATACTTCTATCGTGCACAGATTCAGGCACAGTTTCAAAAGTCCTTAACAGATGTGaCCAT
|
276
|
+
AACAGATTACAGAGAAAATGCTGATTTCCAGGATCTCATAGACGCATTACAATCCGGTCT
|
277
|
+
TTCTTGTTGTGGTGTCAATTCCTatGAAGACTGGGATAATAATATTTATTTCAACTGTAG
|
278
|
+
TGGTCCTGCCAATAACCCTGAAGCcttGTGGTGTGCCTTtCTCCTGTTGTATACCGGATC
|
279
|
+
AAGCAAGCGGAGTAGCCAACACCCAGTGCGGTTATGGAGTTCGTTCCCCCGAACAACAAA
|
280
|
+
ATACTTTCCACACAAAGATTTACACCACTGGCTGTGCGGATATGTTTACAATGTGGATTA
|
281
|
+
ATAGGTACCTATATTACATAGCAGGCATTGCTGGGGTCATTGTCTTGGTCGAGTtGTTTG
|
282
|
+
GATTCTGTTTTGCACATTCCCTCATCAACGACATCAAACGCCAAAAGGCCCGCTGGGCGC
|
283
|
+
ATCGATAATTCATTCCAGGATGTTGGTGgATGATGCTACTCAAGGGagAAGACTGACAGT
|
284
|
+
GCCTTTtGGTCAaTATCGTGTAGCATCAGGAAGGAGGTAGTACCTCCTCAACTAACCaTA
|
285
|
+
ACAGAATTTGTCCAGTTTGTAACATCGTCAAGAAATAAACAGACTTTTTTTACCATTAGG
|
286
|
+
ACgTGATAATACTACCACGTAACCTCTCAAAGCACAAAAAGCAAAAAGCAAATATCTCCT
|
287
|
+
TGTTTTAAAATTAGaagGTCTATCTCAGATAACAACCACAGAACATgTGGAGTTTTCCtT
|
288
|
+
TATGCTATCATAAAGATATAAATATATATAAAATTGAGGTAGcATCtTGGCTACCCACCA
|
289
|
+
AAATCATTTTTTTTCCAGTTTGaAACATCATGGAACATTTCAGAACAAAGATCATTTCAG
|
290
|
+
TCGTTACCACACTCAAGAgaTTGCTGTcGTCAaCaTTTtGtaGCTTTTtAAtGTCTTGAT
|
291
|
+
CTTCGTCGACATCGTCAATGTGTAAACTATTCTCGACGAGAGATTAGTGTCTAATACTGC
|
292
|
+
GGGTgATTTGATATAAATCTCACTTGG
|
293
|
+
>isotig00010 gene=isogroup00003 length=1650 numContigs=5
|
294
|
+
TGAATGAGAAAtGAAATTTAGCGAAGAAATCACCTTGTAAATTAAAAACTAAAATGGCTT
|
295
|
+
TCACACAAATTAaCAGTAAAtGgAGAATGTTTTTAAAGCAATATATGCAGTACAGCcATT
|
296
|
+
CATTGGAAAACAGTAAcAAAaTACATTTATCTTGTtcATTTTtACctCctGCAAaacTTA
|
297
|
+
cAaCcGTTAATTATGTAGATTGGATGGCACTAACAGGGTACTTGTCTTATCTGCCTATTG
|
298
|
+
GATAATGTGGcATTAATACTACTGTGTATGGGCACTGAGGCTGAGAGTGCAGTAAGTTtA
|
299
|
+
AAGGCATTGAAGACTCtCCCCGAaCcGCGtGCCGGGCTctGAAAAAGTtAaCTGCTCGCA
|
300
|
+
AaTtAcGTTTtCTtCTTGTCaCTaCAAAaTGCAGACATTaaTGAAACGTGATACCTTGTt
|
301
|
+
ATCTTTTATCTAGACCTGAGATGTCcAtCGCTGCTATgTACAcTGTGTTGTGGGTATTGA
|
302
|
+
CcgTAGCTGTATGTATtGACTGTACACTAGTGTCTAATtACCGACGGTAGCAAGCTGTGT
|
303
|
+
GTGTATTTTCTGGGATCaaTGGTGgTGTCTAACACTTCtGTTACACCtCAtTcGAAACTA
|
304
|
+
GGTCAGAtTAcCgGCATAGACGTTTCTTTGTGCcgAGTCtTCACAcccttttaaggagaa
|
305
|
+
gtattttatatcccattaaTAGAGAAAGAATTATGTTTCAATTGGTAGCATGCAACTTAA
|
306
|
+
AAtTTTGCAAACaTTAGATACAATCAAACAATAACAAGGTTCATTACAAGGttAAtCtac
|
307
|
+
TGCCCTTTATTtCACATTaGTCTGTCACATCAGAAGgTCACAGCTTtCAaTAaTTATACA
|
308
|
+
AACAAATtCCCTtGATGGgTGCTtGgTtAGATTCCTGCaatTTTCAAGTTTATCAATGTA
|
309
|
+
ATAAGTTCTGAATGTGGCAATGTGGaaGAAGCtTtGGGATAATCTGTGGATAAGACTGCC
|
310
|
+
AGACTATCAACAAGATTCCACATCCATGCAACTCCAACTGCTTCCTTCACCTCCTTGCAG
|
311
|
+
ATGTACCACATGTCCaaAAAAAAGTCAGCATTGGTTAATGTAATTAAAATCTGGCTTCCT
|
312
|
+
cCTGAGCTGGCAAATACACATGAATTGTCAAGTACAGAGGTCAGTGTGAAACCACTGAAA
|
313
|
+
GATCTTCTCAGCTTTCAAGAAAACAAAGACTTGAAGCCAAATTGACAGAGGCCACACTGA
|
314
|
+
TACCATTCCACTCTCATAAGATGAAGGTATCACACACACTTCATTTTGCTTCTGCGATGC
|
315
|
+
AGTGCCTGGTAGACTGTGAGGgTCACCCAATGGATgtTTTAaCAaCTGCCtGGTTtAtAG
|
316
|
+
AGCTCTGCAACAGATAATTCATCCTAaTGTCTAGTCGTCATCCTGTCATGGCCTTGAGCA
|
317
|
+
AGTTGAACCCACTTCAACACAAAGCAGCTATTGAATTCTTGTCTAGGTACTGTcAAATCC
|
318
|
+
ACATCACCATCATTGCttGGTTCCAGCTaCGcTGACCATGaTAAAAGAGTACAATGAGGG
|
319
|
+
TTTTTTAATTCACCCAACAGAGCTTGCATTCCAGTACCTTTGGGCAGCTGATATCCATTT
|
320
|
+
TGTTCCTCGTATgCCTGTCAAAATCTGACATTctGagTCGCTTCGTTTGTTCGCAACGAG
|
321
|
+
CACAGTGTGCAAAGctGCTATATATTGTCC
|
322
|
+
>isotig00011 gene=isogroup00003 length=1525 numContigs=6
|
323
|
+
ACATTCTTCAAGAGCTCTGCACCCACCAATCTAAAGTGACCAGCCAAGTGACTGACCTCA
|
324
|
+
GGGCACAGTTAGCAGCTTTGACCACAGGATGAGCTATGTAACAACTGAAtgaaTGGTGTT
|
325
|
+
CAtcGTTGATTGGGCAgTCAAAACAGCTGAATTTCTCTTGCGgAAGACATAAAGGCATTG
|
326
|
+
AAGACtcGCCcAAaccGtGTGcgcccTCTGAAAAaGTTAACTTTctGTTgCTTGCAaGTG
|
327
|
+
AAGTTTtcTtCTtGTCgCTACAAAATGCAGACAGTAaTgAAACGTGATACcTtGTtATCT
|
328
|
+
TTtATCTAgACctGAGATGtCcACGCTGCTATGTACACTGTGTTGTGGgTATTGACcGTA
|
329
|
+
GCTGTATGTATtGACTGTACACTAGTGTCTAATtACCGACGGTAGCAAGCTGTGTGTGTA
|
330
|
+
TTTTCTGGGATCaaTGGTGgTGTCTAACACTTCtGTTACACCtCAtTcGAAACTAGGTCA
|
331
|
+
GAtTAcCgGCATAGACGTTTCTTTGTGCcgAGTCtTCACAcccttttaaggagaagtatt
|
332
|
+
ttatatcccattaaTAGAGAAAGAATTATGTTTCAATTGGTAGCATGCAACTTAAAAtTT
|
333
|
+
TGCAAACaTTAGATACAATCAAACAATAACAAGGTTCATTACAAGGttAAtCtacTGCCC
|
334
|
+
TTTATTtCACATTaGTCTGTCACATCAGAAGgTCACAGCTTtCAaTAaTTATACAAACAA
|
335
|
+
ATtCCCTtGATGGgTGCTtGgTtAGATTCCTGCaatTTTCAAGTTTATCAATGTAATAAG
|
336
|
+
TTCTGAATGTGGCAATGTGGaaGAAGCtTtGGGATAATCTGTGGATAAGACTGCCAGACT
|
337
|
+
ATCAACAAGATTCCACATCCATGCAACTCCAACTGCTTCCTTCACCTCCTTGCAGATGTA
|
338
|
+
CCACATGTCCaaAAAAAAGTCAGCATTGGTTAATGTAATTAAAATCTGGCTTCCTcCTGA
|
339
|
+
GCTGGCAAATACACATGAATTGTCAAGTACAGAGGTCAGTGTGAAACCACTGAAAGATCT
|
340
|
+
TCTCAGCTTTCAAGAAAACAAAGACTTGAAGCCAAATTGACAGAGGCCACACTGATACCA
|
341
|
+
TTCCACTCTCATAAGATGAAGGTATCACACACACTTCATTTTGCTTCTGCGATGCAGTGC
|
342
|
+
CTGGTAGACTGTGAGGgTCACCCAATGGATgtTTTAaCAaCTGCCtGGTTtAtAGAGCTC
|
343
|
+
TGCAACAGATAATTCATCCTAaTGTCTAGTCGTCATCCTGTCATGGCCTTGAGCAAGTTG
|
344
|
+
AACCCACTTCAACACAAAGCAGCTATTGAATTCTTGTCTAGGTACTGTcAAATCCACATC
|
345
|
+
ACCATCATTGCttGGTTCCAGCTaCGcTGACCATGaTAAAAGAGTACAATGAGGGTTTTT
|
346
|
+
TAATTCACCCAACAGAGCTTGCATTCCAGTACCTTTGGGCAGCTGATATCCATTTTGTTC
|
347
|
+
CTCGTATgCCTGTCAAAATCTGACATTctGagTCGCTTCGTTTGTTCGCAACGAGCACAG
|
348
|
+
TGTGCAAAGctGCTATATATTGTCC
|
349
|
+
>isotig00012 gene=isogroup00003 length=1483 numContigs=6
|
350
|
+
AGTTAAAAGTTGAAAAATTGGTGACCATATTTTGACACTCTAGCATATTTGGGAGCTATA
|
351
|
+
TACTGATTTGGGTTTCACCATGCACAGATGAGGTATATACATAAGTTGAAAGCCTGCAGC
|
352
|
+
TCTATATTAAAGGCATTGAAGACtcGCCcAAaccgtgTGcgcccTCTGAAAAaGTTAACT
|
353
|
+
TTCcGTTgCTTGCAaGTGAAGTTTtcTtCTTGTCGCTACAAAATGCAGACAGTAATGAAA
|
354
|
+
CGTGATACcTtGTtATCTTTtATCTAgACcTGAGATGtCcACGCTGCTATGTACACTGTG
|
355
|
+
TTGTGGgTATTGACcGTAGCTGTATGTATtGACTGTACACTAGTGTCTAATtACCGACGG
|
356
|
+
TAGCAAGCTGTGTGTGTATTTTCTGGGATCaaTGGTGgTGTCTAACACTTCtGTTACACC
|
357
|
+
tCAtTcGAAACTAGGTCAGAtTAcCgGCATAGACGTTTCTTTGTGCcgAGTCtTCACAcc
|
358
|
+
cttttaaggagaagtattttatatcccattaaTAGAGAAAGAATTATGTTTCAATTGGTA
|
359
|
+
GCATGCAACTTAAAAtTTTGCAAACaTTAGATACAATCAAACAATAACAAGGTTCATTAC
|
360
|
+
AAGGttAAtCtacTGCCCTTTATTtCACATTaGTCTGTCACATCAGAAGgTCACAGCTTt
|
361
|
+
CAaTAaTTATACAAACAAATtCCCTtGATGGgTGCTtGgTtAGATTCCTGCaatTTTCAA
|
362
|
+
GTTTATCAATGTAATAAGTTCTGAATGTGGCAATGTGGaaGAAGCtTtGGGATAATCTGT
|
363
|
+
GGATAAGACTGCCAGACTATCAACAAGATTCCACATCCATGCAACTCCAACTGCTTCCTT
|
364
|
+
CACCTCCTTGCAGATGTACCACATGTCCaaAAAAAAGTCAGCATTGGTTAATGTAATTAA
|
365
|
+
AATCTGGCTTCCTcCTGAGCTGGCAAATACACATGAATTGTCAAGTACAGAGGTCAGTGT
|
366
|
+
GAAACCACTGAAAGATCTTCTCAGCTTTCAAGAAAACAAAGACTTGAAGCCAAATTGACA
|
367
|
+
GAGGCCACACTGATACCATTCCACTCTCATAAGATGAAGGTATCACACACACTTCATTTT
|
368
|
+
GCTTCTGCGATGCAGTGCCTGGTAGACTGTGAGGgTCACCCAATGGATgtTTTAaCAaCT
|
369
|
+
GCCtGGTTtAtAGAGCTCTGCAACAGATAATTCATCCTAaTGTCTAGTCGTCATCCTGTC
|
370
|
+
ATGGCCTTGAGCAAGTTGAACCCACTTCAACACAAAGCAGCTATTGAATTCTTGTCTAGG
|
371
|
+
TACTGTcAAATCCACATCACCATCATTGCttGGTTCCAGCTaCGcTGACCATGaTAAAAG
|
372
|
+
AGTACAATGAGGGTTTTTTAATTCACCCAACAGAGCTTGCATTCCAGTACCTTTGGGCAG
|
373
|
+
CTGATATCCATTTTGTTCCTCGTATgCCTGTCAAAATCTGACATTctGagTCGCTTCGTT
|
374
|
+
TGTTCGCAACGAGCACAGTGTGCAAAGctGCTATATATTGTCC
|
375
|
+
>isotig00013 gene=isogroup00003 length=1485 numContigs=5
|
376
|
+
TGTGTGTGTGTGGTGCTTCCccTCTAGGGCTGTAAATTTCAAAGGAACCTTGCGCAAGAA
|
377
|
+
CAGtAGCTTGCGaCGTTTTTCAAaaCCAGAGGTTCTGAACTGAACTGTACTGACTACTGT
|
378
|
+
AGGGtacTTAAaGGCATTGAAGACTCGCCcAAaCCatgTGCCGCGctttGAAAAAGTTAA
|
379
|
+
CTTTCCGTTGCTTGCAAATGAcGTTTtcTtCTtGTCgCTACAAAATGCAGACAGTAaTgA
|
380
|
+
AACGTGATACcTtGTtATCTTTtATCTAgACctGAGATGtCcACGCTGCTATGTACACTG
|
381
|
+
TGTTGTGGgTATTGACcGTAGCTGTATGTATtGACTGTACACTAGTGTCTAATtACCGAC
|
382
|
+
GGTAGCAAGCTGTGTGTGTATTTTCTGGGATCaaTGGTGgTGTCTAACACTTCtGTTACA
|
383
|
+
CCtCAtTcGAAACTAGGTCAGAtTAcCgGCATAGACGTTTCTTTGTGCcgAGTCtTCACA
|
384
|
+
cccttttaaggagaagtattttatatcccattaaTAGAGAAAGAATTATGTTTCAATTGG
|
385
|
+
TAGCATGCAACTTAAAAtTTTGCAAACaTTAGATACAATCAAACAATAACAAGGTTCATT
|
386
|
+
ACAAGGttAAtCtacTGCCCTTTATTtCACATTaGTCTGTCACATCAGAAGgTCACAGCT
|
387
|
+
TtCAaTAaTTATACAAACAAATtCCCTtGATGGgTGCTtGgTtAGATTCCTGCaatTTTC
|
388
|
+
AAGTTTATCAATGTAATAAGTTCTGAATGTGGCAATGTGGaaGAAGCtTtGGGATAATCT
|
389
|
+
GTGGATAAGACTGCCAGACTATCAACAAGATTCCACATCCATGCAACTCCAACTGCTTCC
|
390
|
+
TTCACCTCCTTGCAGATGTACCACATGTCCaaAAAAAAGTCAGCATTGGTTAATGTAATT
|
391
|
+
AAAATCTGGCTTCCTcCTGAGCTGGCAAATACACATGAATTGTCAAGTACAGAGGTCAGT
|
392
|
+
GTGAAACCACTGAAAGATCTTCTCAGCTTTCAAGAAAACAAAGACTTGAAGCCAAATTGA
|
393
|
+
CAGAGGCCACACTGATACCATTCCACTCTCATAAGATGAAGGTATCACACACACTTCATT
|
394
|
+
TTGCTTCTGCGATGCAGTGCCTGGTAGACTGTGAGGgTCACCCAATGGATgtTTTAaCAa
|
395
|
+
CTGCCtGGTTtAtAGAGCTCTGCAACAGATAATTCATCCTAaTGTCTAGTCGTCATCCTG
|
396
|
+
TCATGGCCTTGAGCAAGTTGAACCCACTTCAACACAAAGCAGCTATTGAATTCTTGTCTA
|
397
|
+
GGTACTGTcAAATCCACATCACCATCATTGCttGGTTCCAGCTaCGcTGACCATGaTAAA
|
398
|
+
AGAGTACAATGAGGGTTTTTTAATTCACCCAACAGAGCTTGCATTCCAGTACCTTTGGGC
|
399
|
+
AGCTGATATCCATTTTGTTCCTCGTATgCCTGTCAAAATCTGACATTctGagTCGCTTCG
|
400
|
+
TTTGTTCGCAACGAGCACAGTGTGCAAAGctGCTATATATTGTCC
|
401
|
+
>isotig00014 gene=isogroup00003 length=1459 numContigs=6
|
402
|
+
GGACAATATATAGCagCTTTGCACACTGTGCTCGTTGCGAACAAACGAAGCGActCagAA
|
403
|
+
TGTCAGATTTTGACAGGcATACGAGGAACAAAATGGATATCAGCTGCCCAAAGGTACTGG
|
404
|
+
AATGCAAGCTCTGTTGGGTGAATTAAAAAaCCcTCATTGTACTCTTTTATCATGGTCAGC
|
405
|
+
GTAGCTGGAACCAGCAATGATGGTGATGTGGATTTGACAGTACCTAGACAAGAATTCAAT
|
406
|
+
AGCTGCTTTGTGTTGAAGTGGGTTCAACTTGCTCAAGGCCATGACAGGATGACGACTAGA
|
407
|
+
CATtAGGATGAATTATCTGTTGCAGAGCTCTATAAaCCAGGCAGTtGTtAAAaCATCCAT
|
408
|
+
TGGGTGACCcTCACAGTCTACCAGGCACTGCATCGCAGAAGCAAAATGAAGTGTGTGTgA
|
409
|
+
TACCTTCATCTTATGAGAGTGGAATGGTATCAGTGTGGCCTCTGTCAATTTGGCTTCAAG
|
410
|
+
TCTTTGTTTTCTTGAAAGCTGAGAaGATCTTTCAGTGGTTTCACACTGACCTCTGTACTT
|
411
|
+
GACAATTCATGTGTATTTGCCAGCTCAGgAGGAAGCCAGATTTTAATTACATTAACCAAT
|
412
|
+
GCTGACTTTTTTttGGACATGTGGTACATCTGCAAGGAGGTGAAGGAAGCAGTTGGAGTT
|
413
|
+
GCATGGATGTGGAATCTTGTTGATAGTCTGGCAGTCTTATCCACAGATTATCCCAAAGCT
|
414
|
+
TCTCCACATTGCCACATTCAGAACTTATTACATTGATAAACTTGAAAATtGCAGGAATCT
|
415
|
+
AaCcAaGCACCcATCAaGGGAaTTTGTTTGTATAATtATtGAAaGCTGTGACcTTCTGAT
|
416
|
+
GTGACAGACTAATGTGAAaTAAAGGgCAgtaGaTTaCCTTGTaaTGAACCttGTTATTGT
|
417
|
+
TTGATTGTATCTAAtGTTTGCAaaTTTTAAGTTGCATGCTACCAATTGAAACATAATTCT
|
418
|
+
TTCTCTAttaatgggatataaaatacttctccttaaaagggTGTgAaGACTcggCACAAA
|
419
|
+
GAAACGTCtaTGCcGgtAaTCTGACCTAGTTTcgAatGaGGTGTAACagAAGTgTtAGAC
|
420
|
+
ACcACCAttGATCCcAGAAAATACACACACAGCTTGCTACCGTCGGTAaTTAGACACTAG
|
421
|
+
TGTACAGTCAaTACATACAGCTAcGgTCAATACCCACAaCACAgTGTAcATAGCAGCGaT
|
422
|
+
GgACATCTCAGGTCTAGATAAAAGATAaCAAGGTATCACGTTTCATtaCTGTCTGCATTT
|
423
|
+
tGTAGCgaCAagAAGAAAAcgtCATTtGCAAGCAaTGgAAAGTtAACTTTTTCaGAGCGc
|
424
|
+
agCAcGCgggTTGGGGCAAGTCTTCCAAGCCTTTAAGTtGACAtcTTGCCTTTGGCTATC
|
425
|
+
CAGGgTGACAAGATGATACTAGCAGGTAgagtgactaattgagccctgtgtgagaaacca
|
426
|
+
atgcagaatctagcctagt
|
427
|
+
>isotig00015 gene=isogroup00003 length=1138 numContigs=4
|
428
|
+
TGAATGAGAAAtGAAATTTAGCGAAGAAATCACCTTGTAAATTAAAAACTAAAATGGCTT
|
429
|
+
TCACACAAATTAaCAGTAAAtGgAGAATGTTTTTAAAGCAATATATGCAGTACAGCcATT
|
430
|
+
CATTGGAAAACAGTAAcAAAaTACATTTATCTTGTtcATTTTtACctCctGCAAaacTTA
|
431
|
+
cAaCcGTTAATTATGTAGATTGGATGGCACTAACAGGGTACTTGTCTTATCTGCCTATTG
|
432
|
+
GATAATGTGGcATTAATACTACTGTGTATGGGCACTGAGGCTGAGAGTGCAGTAAGTTtA
|
433
|
+
AAGGCATTGAAGACTCtCCCCGAaCcGCGtGCCGGGCTctGAAAAAGTtAaCTGCTCGCA
|
434
|
+
AaTtAcGTTTtCTtCTTGTCaCTaCAAAaTGCAGACATTaaTGAAACGTGATACCTTGTt
|
435
|
+
ATCTTTTATCTAGACCTGAGATGTCcAtCGCTGCTATgTACAcTGTGTTGTGGGTATTGA
|
436
|
+
CcgTAGCTGTATGTATtGACTGTACACTAGTGTCTAATtACCGACGGTAGCAAGCTGtGT
|
437
|
+
TTGTATTTTCtGGGATCGatGGCAGTGTCTAACACTTcTGTtACACCTCATtcGAAACTA
|
438
|
+
GGTCAGATTACCGGCATTAGACGTtCTTTTTGCgCGAGTCTTCACACCCTTTtAAAGctA
|
439
|
+
CTCCAtgCTGACAcACGtGgTTCCGGacTACAGAGCAATAAAAaGTAACATTCACTCCTT
|
440
|
+
GAagTtaCTCCATGCTGgCTGCCCTTAtaGATGTGGCaatGGAtaCGGACgAGAGACTTC
|
441
|
+
ACTTCTGTTGGTTGCaaaaTTCCATACACCATGGAAGCATGGAACTCACAAAACTAGtGT
|
442
|
+
TGTAgAGGGGGAGCATAGtctATGtAAATGTatGTTCTACGCCTCTGTCCCaGCTGGAAT
|
443
|
+
GGCCAGTTTATCTGCCACAATGAAGAATTGTTTGGGgTTCAATtCTGGtCcgaGAGATAG
|
444
|
+
GATGAAaGGCTGtcAATATTGTCCTTGTCTGCCCTGTGCTGCgCTCTCAATATCTGTGCC
|
445
|
+
CTcccTCGaacaCTGTTattCACTTCTTCGTGGAAACCTTTATTTGTAAGAAAAGTTCTT
|
446
|
+
AAAGACTCAGCCAttGCTAATTTATAACCTTTACTCTAGCTTAGACATACGGTCGTCT
|
447
|
+
>isotig00016 gene=isogroup00003 length=2185 numContigs=5
|
448
|
+
ATGAATGCTGGCCAGATATTTATCGCCTTGATGGCACAACTTTTCAACGCATGTCTTCTC
|
449
|
+
GTTTCTTCCAATTTCGATAGTGACATAGCTGACTCGACACTAGGAAAGAGATCTACAGGG
|
450
|
+
TTCGTGGACACGTTTGGGAAGCGTTTTGTTGACTCATTCGGTAAACGCGTGGACGAATTT
|
451
|
+
GATTATGATCACAATGGGAACTATGCCGAACAAAGTGAACAATCTTCATACATCAGTCCT
|
452
|
+
CAACTCAAACGAGGTCAAAAAGGACTGAGAAGCGGATCATTTATTGATGCTTTCGGGAAA
|
453
|
+
CGGAGTTCCTTCCAAGAAGTCGATGAGAAGAGGTTCGCGGACTCATTCGGCAAAAGATTC
|
454
|
+
GCGGACTCATTTGGGAAAAGGAGCCCGGTAGGATTTGTTGACACCTTGGGTAAAAGATTT
|
455
|
+
GCGGTCTCATTCGGTAAAAGAAATACAGTCGGATTTGTTGACACTTTGGGTAAAAGATTC
|
456
|
+
GCAGACTCGTTCGGCAAGCGGTCTCAACAAGGTTTTGTAGATGCATTCGGCAAACGATAC
|
457
|
+
CAGGGCGTTTACTAA
|
458
|
+
>isotig00017 gene=isogroup00003 length=2185 numContigs=5
|
459
|
+
ATGTGTGGCTGCATTGACGACGCAGAGTTTGCAGCAACTCATCAAGTCCAGTTTTGTGAA
|
460
|
+
ATCAATTCTGCGACATTCAATCCAAGAGAAGATCCTCTTATTGATTGTCTATATTCGGCC
|
461
|
+
AAAGACAGCGCTATTTGCTCGTGCCCTGAACTTTGCAGTGAACTCGTATACGAAGTCTCC
|
462
|
+
AAAGACTCTGTTGATTGGCCAAATATGGCAAACCTGCTCCCGTTCTTGGAGCAAATAAAT
|
463
|
+
TCATCAATGACGGGCAAACCTGCCCGAACATTTTTCGACTCGATAATTAACCACTACAGA
|
464
|
+
GCCGGTCGCCATGATGAAGCACTAGATTCAGTTCGGAGTACGTTTCTTCAACTCAATATC
|
465
|
+
TACATAGAGACAATGGAGGTTGAAGAATACACGGACAGACCCGTTTATGAT
|
data/lib/npsearch.rb
CHANGED
@@ -54,8 +54,8 @@ module NpSearch
|
|
54
54
|
end
|
55
55
|
|
56
56
|
def initialise_seqs(entry)
|
57
|
-
return if entry.aaseq.length > @opt[:
|
58
|
-
sp = Signalp.analyse_sequence(entry.aaseq)
|
57
|
+
return if entry.aaseq.length > @opt[:max_orf_length]
|
58
|
+
sp = Signalp.analyse_sequence(entry.aaseq.to_s)
|
59
59
|
return if sp[:sp] == 'N'
|
60
60
|
# seq = Sequence.new(entry.entry_id, entry.definition, entry.aaseq, sp)
|
61
61
|
seq = Sequence.new(entry, sp)
|
@@ -1,4 +1,5 @@
|
|
1
1
|
require 'bio'
|
2
|
+
|
2
3
|
# Top level module / namespace.
|
3
4
|
module NpSearch
|
4
5
|
# A class that validates the command line opts
|
@@ -6,6 +7,7 @@ module NpSearch
|
|
6
7
|
class << self
|
7
8
|
def run(opt)
|
8
9
|
assert_file_present('input fasta file', opt[:input_file])
|
10
|
+
opt[:input_file] = File.expand_path(opt[:input_file])
|
9
11
|
assert_input_file_not_empty(opt[:input_file])
|
10
12
|
assert_input_file_probably_fasta(opt[:input_file])
|
11
13
|
opt[:type] = assert_input_sequence(opt[:input_file])
|
@@ -48,8 +50,9 @@ module NpSearch
|
|
48
50
|
exit 1
|
49
51
|
end
|
50
52
|
|
53
|
+
# determine file sequence type based on first 500 lines
|
51
54
|
def type_of_sequences(file)
|
52
|
-
fasta_content =
|
55
|
+
fasta_content = File.foreach(file).first(500).join("\n")
|
53
56
|
# the first sequence does not need to have a fasta definition line
|
54
57
|
sequences = fasta_content.split(/^>.*$/).delete_if(&:empty?)
|
55
58
|
# get all sequence types
|
data/lib/npsearch/output.rb
CHANGED
@@ -18,8 +18,8 @@ module NpSearch
|
|
18
18
|
sorted_sequences.each do |s|
|
19
19
|
if input_type == :protein
|
20
20
|
f.puts ">#{s.defline}\n#{s.signalp}#{s.seq}"
|
21
|
-
elsif input_type == :
|
22
|
-
f.puts ">#{s.defline}
|
21
|
+
elsif input_type == :genetic
|
22
|
+
f.puts ">#{s.defline}"
|
23
23
|
f.puts "#{s.signalp}#{s.seq}"
|
24
24
|
end
|
25
25
|
end
|
data/lib/npsearch/signalp.rb
CHANGED
@@ -1,4 +1,6 @@
|
|
1
1
|
require 'forwardable'
|
2
|
+
require 'open3'
|
3
|
+
require 'timeout'
|
2
4
|
|
3
5
|
# Top level module / namespace.
|
4
6
|
module NpSearch
|
@@ -11,33 +13,34 @@ module NpSearch
|
|
11
13
|
def analyse_sequence(seq)
|
12
14
|
sp_headers = %w(name cmax cmax_pos ymax ymax_pos smax smax_pos smean d
|
13
15
|
sp dmaxcut networks orf)
|
14
|
-
|
15
|
-
|
16
|
-
|
17
|
-
|
18
|
-
sp_results = s.split("\n").delete_if { |l| l[0] == '#' }
|
19
|
-
sp_results.each_with_index do |line, idx|
|
20
|
-
line = line + ' ' + data[:seq][idx].to_s
|
21
|
-
orf_results << Hash[sp_headers.map(&:to_sym).zip(line.split)]
|
16
|
+
seqs = setup_analysis(seq)
|
17
|
+
sp_results = []
|
18
|
+
seqs.each do |seq|
|
19
|
+
sp_results << run_signalp(seq, sp_headers)
|
22
20
|
end
|
23
|
-
|
21
|
+
sp_results.sort_by { |h| h[:d] }.reverse[0]
|
24
22
|
end
|
25
23
|
|
26
|
-
|
27
|
-
|
28
|
-
|
29
|
-
|
30
|
-
|
31
|
-
|
32
|
-
|
24
|
+
private
|
25
|
+
|
26
|
+
def run_signalp(seq, sp_headers)
|
27
|
+
Timeout::timeout(300) do
|
28
|
+
cmd = "echo '>seq\n#{seq}\n' | #{opt[:signalp_path]} -t euk" \
|
29
|
+
" -f short -U 0.34 -u 0.34"
|
30
|
+
stdin, stdout, stderr, wait_thr = Open3.popen3(cmd)
|
31
|
+
out = stdout.gets(nil).split("\n").delete_if { |l| l[0] == '#' }
|
32
|
+
stdin.close; stdout.close; stderr.close
|
33
|
+
result = out[0] + ' ' + seq
|
34
|
+
return Hash[sp_headers.map(&:to_sym).zip(result.split)]
|
33
35
|
end
|
34
|
-
|
36
|
+
rescue Timeout::Error
|
37
|
+
no_results = [0,0,1,1,1,1,1,1,1,'N',1,1, seq]
|
38
|
+
return Hash[sp_headers.map(&:to_sym).zip(no_results)]
|
35
39
|
end
|
36
40
|
|
37
|
-
def
|
38
|
-
|
39
|
-
|
40
|
-
fasta
|
41
|
+
def setup_analysis(seq)
|
42
|
+
orfs = seq.scan(/(?=(M\w{#{opt[:min_orf_length]},}))./).flatten
|
43
|
+
(opt[:type] == :protein || orfs.empty? || orfs.nil?) ? [seq] : orfs
|
41
44
|
end
|
42
45
|
end
|
43
46
|
end
|
data/lib/npsearch/version.rb
CHANGED
data/templates/contents.slim
CHANGED
@@ -5,7 +5,7 @@ html lang="en"
|
|
5
5
|
meta content="IE=edge" http-equiv="X-UA-Compatible"
|
6
6
|
meta content="width=device-width, initial-scale=1" name="viewport"
|
7
7
|
meta content="NpSearch | Identify Novel Neuropeptides" name="description"
|
8
|
-
meta content="
|
8
|
+
meta content="Moghul et al." name="author"
|
9
9
|
title NpSearch | Identify Novel Neuropeptides
|
10
10
|
css:
|
11
11
|
html { position: relative; min-height: 100%; }
|
@@ -28,10 +28,7 @@ html lang="en"
|
|
28
28
|
- @sorted_sequences.each do |seq|
|
29
29
|
p.sequence
|
30
30
|
span.id
|
31
|
-
|
32
|
-
| >#{seq.defline}
|
33
|
-
- elsif @opt[:type] == :nucleotide
|
34
|
-
| >#{seq.defline}-(frame:#{seq.translated_frame})
|
31
|
+
| >#{seq.defline}
|
35
32
|
br
|
36
33
|
span.seq== seq.html_seq
|
37
34
|
br
|
@@ -39,13 +36,15 @@ html lang="en"
|
|
39
36
|
br
|
40
37
|
footer
|
41
38
|
p
|
42
|
-
| Please cite "Moghul
|
39
|
+
| Please cite "Moghul
|
43
40
|
em
|
44
|
-
| (in prep)
|
45
|
-
| NpSearch:
|
41
|
+
| et al. (in prep)
|
42
|
+
| NpSearch: Identify Novel Neuropeptides"
|
46
43
|
br
|
47
44
|
| Developed at
|
48
|
-
a href="https://wurmlab.github.io" target="_blank" Wurm Lab
|
45
|
+
a href="https://wurmlab.github.io" target="_blank" Wurm Lab
|
46
|
+
| &
|
47
|
+
a href="http://www.sbcs.qmul.ac.uk/staff/mauriceelphick.html" target="_blank" Elphick Lab
|
49
48
|
| ,
|
50
49
|
a href="http://www.sbcs.qmul.ac.uk" target="_blank" QMUL
|
51
50
|
br
|
metadata
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
--- !ruby/object:Gem::Specification
|
2
2
|
name: npsearch
|
3
3
|
version: !ruby/object:Gem::Version
|
4
|
-
version: 2.1.
|
4
|
+
version: 2.1.1
|
5
5
|
platform: ruby
|
6
6
|
authors:
|
7
7
|
- Ismail Moghul
|
@@ -12,7 +12,7 @@ authors:
|
|
12
12
|
autorequire:
|
13
13
|
bindir: bin
|
14
14
|
cert_chain: []
|
15
|
-
date: 2016-
|
15
|
+
date: 2016-11-11 00:00:00.000000000 Z
|
16
16
|
dependencies:
|
17
17
|
- !ruby/object:Gem::Dependency
|
18
18
|
name: bundler
|
@@ -117,6 +117,8 @@ files:
|
|
117
117
|
- README.md
|
118
118
|
- Rakefile
|
119
119
|
- bin/npsearch
|
120
|
+
- exemplar_data/README.md
|
121
|
+
- exemplar_data/genetic_data.fa
|
120
122
|
- lib/npsearch.rb
|
121
123
|
- lib/npsearch/arg_validator.rb
|
122
124
|
- lib/npsearch/output.rb
|