npsearch 2.1.0 → 2.1.1

Sign up to get free protection for your applications and to get access to all the features.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA1:
3
- metadata.gz: af22531e55865ab286dd6599917196765d72af12
4
- data.tar.gz: 5a3bf459332ff8bc70c3c6e431cae9e09fe0494c
3
+ metadata.gz: 0e02888758654087d6af73b5fc87bba5363a0b4c
4
+ data.tar.gz: 2fd35312b2b18d3dfefe99686397dfd6bd7d5cfb
5
5
  SHA512:
6
- metadata.gz: 899fed317d7ceb7a62d52fb2b3e0e24e835f630c058a9dacf05e267019a900c5c2357588e9f1bdd674731eb951a44f16d5898747efb872e6ae1ceaf5efb8acf4
7
- data.tar.gz: 0aab6be7e635dd63b2e8e2d4d5eda128f7f39b41977f5b08fb090408dd612ac98d41d3fe60a086ea5a16d2cff56bccbce79b8168c3f2af2e3fff222b1657b908
6
+ metadata.gz: 3683325fc2081158d10ab07164e19d8dff0fb16d1031b592c9fdbd6221f13b1877f5f959d78d06acb6978cb351f1e5f96a3d53a9758af16f1f4d6c04a283019c
7
+ data.tar.gz: c9581cbd2ec7b00b22931fc0957e69bdb8cb525981e106ab759ee876e0c9de673d89bcc1156f5792871ec9c0c13357ac8a7ef0a68326c221d7e9873d5608f4fd
data/README.md CHANGED
@@ -3,24 +3,34 @@
3
3
  [![Gem Version](https://badge.fury.io/rb/npsearch.svg)](http://badge.fury.io/rb/npsearch)
4
4
  [![Dependency Status](https://gemnasium.com/wurmlab/NpSearch.svg)](https://gemnasium.com/wurmlab/NpSearch)
5
5
 
6
-
6
+ <strong>Please note this currently in beta. We are currently working on something that is amazingly fast (i.e. a few seconds to run) and a lot better in every sense (it even has an easy-to-use clicky, pointy interface). So watch this place.</strong>
7
7
 
8
8
  ## Introduction
9
- NpSearch is a tool that helps identify novel neuropeptides. As such it is not based on homology to existing neuropeptides - rather NpSearch is based on the common characteristics of neuropeptides and their precursors.
9
+ NpSearch is a tool that helps identify novel neuropeptides. As such it is not based on homology to existing neuropeptides - rather NpSearch is based on the common characteristics of neuropeptides and their precursors. In other words, it is a feature based tool.
10
+
11
+ The results produced includes the entire secretome ordered in the likelihood of the sequence encoding a neuropeptide. As such, it is expected that you only need to analyse the top half of the results.
12
+
13
+ Importantly, NpSearch produces a highly visual html file where the signal peptide and potential cleavage sites are highlighted. Additionally, NpSearch produces a fasta file of the results (i.e. the ordered secretome) that can easily be used in your own pipelines.
10
14
 
11
15
  If you use this program, please cite us:
12
16
 
13
17
  >Moghul I, Rowe M, Priyam A, ELphick M & Wurm Y <em>(in prep)</em> NpSearch: A Tool to Identify Novel Neuropeptides
14
18
 
15
- NpSearch produces a fasta file and highly visual html file that are ordered by the likelihood of a sequence encoding a neuropeptide precursor.
19
+ NpSearch requires an input of a transcriptomic or predicted proteomic dataset, where each sequence is analysed and awarded a relative score of its likelihood of encoding a neuropeptide precursor. When provided with transcriptomic data, NpSearch translates each contig in all six frames and thereafter extracts all potential open reading frame (methionine to stop codon). Each predicted protein sequence is then analysed for the following neuropeptide-related characteristics:
20
+
21
+ **Signal peptide**: All neuropeptide precursors must have a signal peptide. This is due to the fact that the final bioactive neuropeptide has to be secreted from the cell of synthesis in order to be functionally active.
22
+
23
+ **Cleavage sites**: Being derived from a precursor, the bioactive neuropeptide has to be cleaved out from the precursor. Prohormone convertase enzymes cleave these bioactive peptides at specific cleavage sites. As certain cleavage motifs are more likely to be cleaved than other cleavage motifs, NpSearch awards sequences based on the type and number of cleavage sites present.
24
+
25
+ **C-terminal Glycine**: A significant number of bioactive neuropeptides have a C-terminal glycine that is amidated during post-translation modification. Thus such sequences are awarded with a higher score.
26
+
27
+ **Repeated peptides**: Numerous neuropeptide precursors are made up of multiple copies of the same neuropeptide. NpSearch attempts to clustering all potential cleaved neuropeptides, and then awarding sequences that produce larger clusters with a higher score.
16
28
 
17
- NpSearch orders the results based on the following characteristics:
29
+ **Acidic spacer regions**: Neuropeptide precursors that contain multiple neuropeptide copies tend to have highly acidic regions that separate these copies. If detected by NpSearch, the sequence is awarded with a higher score.
30
+
31
+
32
+ After analysing each sequence in the input dataset, NpSearch produces a visual html file and a fasta file, where sequences that are more likely to encode a neuropeptides precursor are placed at the top of the file. These results files can then be easily inspected and curated by researchers.
18
33
 
19
- - **Signal peptide**: All neuropeptide precursors must have a signal peptide. This is due to the fact that the final bioactive neuropeptide has to be secreted from the cell of synthesis in order to be functionally active.
20
- - **Cleavage sites**: Being derived from a precursor, the bioactive neuropeptide has to be cleaved out from the precursor. Prohormone convertase enzymes cleave these bioactive peptides at specific cleavage sites. Since certain cleavage motifs are more likely to be cleaved, NpSearch awards sequences with cleavage site motifs that are more likely to be cleaved with a higher score.
21
- - **C-terminal Glycine**: A significant number of bioactive neuropeptides have a C-terminal glycine, that is amidated during post-translation modification. NpSearch awards sequences that have a potential neuropeptide with a C-terminal glycine a higher score.
22
- - **Repeated peptides**: Some neuropeptide precursors contain numerous copies of the same neuropeptides (usually with slight sequence differences). NpSearch attempts to detect this by aligning all potential neuropeptides within a sequence. If a sequence is found to have multiple, similar predicted NPs, NpSearch awards it with a higher score.
23
- - **Acidic spacer regions**: Neuropeptide precursors that contain multiple neuropeptide copies tend to have highly acidic spacer regions that separate the NP copies. If detected by NpSearch, the sequence is awarded with a higher score.
24
34
 
25
35
 
26
36
 
@@ -31,12 +41,15 @@ NpSearch orders the results based on the following characteristics:
31
41
 
32
42
  ### Installation Requirements
33
43
  * Ruby (>= 2.0.0)
34
- * SignalP 4.1 (Available from [here](http://www.cbs.dtu.dk/cgi-bin/nph-sw_request?signalp))
44
+ * SignalP 4.1.*z (Available from [here](http://www.cbs.dtu.dk/cgi-bin/nph-sw_request?signalp))
35
45
  * CD-HIT (Available from [here](http://weizhongli-lab.org/cd-hit/) - Suggested Installation via [Homebrew](http://brew.sh) or [Linuxbrew](http://linuxbrew.sh) - `brew install homebrew/science/cd-hit`)
36
46
  * EMBOSS (Available from [here](http://emboss.sourceforge.net) - Suggested Installation via [Homebrew](http://brew.sh) or [Linuxbrew](http://linuxbrew.sh) - `brew install homebrew/science/emboss`)
37
47
 
38
48
 
39
49
  ## Installation
50
+
51
+ <strong>While in beta, it is suggested that you run NpSearch from source (i.e. the non-recommended method below)</strong>
52
+
40
53
  Simply run the following command in the terminal.
41
54
 
42
55
  ```bash
@@ -52,7 +65,7 @@ It is also possible to run from source. However, this is not recommended.
52
65
  # Clone the repository.
53
66
  git clone https://github.com/wurmlab/npsearch.git
54
67
 
55
- # Move into NpSearch source directory.
68
+ # Move into the NpSearch source directory.
56
69
  cd NpSearch
57
70
 
58
71
  # Install bundler
@@ -86,35 +99,35 @@ npsearch
86
99
  You should see the following output.
87
100
 
88
101
  ```bash
89
- * Usage: npsearch [Options] -i [Input File]
102
+ * Description: A tool to identify novel neuropeptides.
90
103
 
91
- * Mandatory Options:
104
+ * Usage: npsearch [Options] [Input File]
92
105
 
93
- -i, --input [file] Path to the input fasta file
94
-
95
- * Optional Options:
96
- -s, --signalp_path The full path to the signalp script. This can be downloaded from
97
- CBS. See https://www.github.com/wurmlab/NpSearch for more
98
- information
99
- -u, --usearch_path The full path to the usearch binary. This script can be downloaded
100
- from .... See https://www.github.com/wurmlab/NpSearch for more
106
+ * Options
107
+ -s path_to_signalp, The full path to the SignalP script. This can be downloaded from
108
+ --signalp_path CBS. See https://www.github.com/wurmlab/NpSearch for more
101
109
  information
102
- -n, --num_threads The number of threads to use when analysing the input file
103
- -m, --orf_min_length N The minimum length of a potential neuropeptide precursor.
110
+ -d, --temp_dir path_to_temp_dir The full path to the temp dir. NpSearch will create the folder and
111
+ then delete the folder once it has finished using them.
112
+ Default: Hidden folder in the current working directory
113
+ -n, --num_threads num_of_threads The number of threads to use when analysing the input file
114
+ -l, --min_orf_length N The minimum length of a potential neuropeptide precursor.
104
115
  Default: 30
116
+ -m, --max_seq_length N The maximum length of a potential neuropeptide precursor.
117
+ Default: 600
105
118
  -h, --help Display this screen
106
119
  -v, --version Shows version
107
-
108
120
  ```
109
121
 
110
122
 
111
- ### Example Usage Scenario
123
+ ### Exemplar Usage Scenario
112
124
  The following runs NpSearch on an input fasta dataset.
113
125
 
114
126
  ```bash
115
- npsearch -i INPUT_FASTA_FILE -s /path/to/signalp -u /path/to/usearch -n NUM_THREADS
127
+ npsearch -s /path/to/signalp -n NUM_THREADS INPUT_FASTA_FILE
116
128
  ```
117
129
 
118
- ## Output
119
- The output produced by NpSearch is presented in two manners. NpSearch produces a highly visual HTML file that can be open in any browsers (an example can seen [here]()) and a fasta file.
130
+ ## Note
120
131
 
132
+ - With the current version of NpSearch, there is an issue with the number of threads used - it seems to use more threads than that specified in the command line argument
133
+ - NpSearch is expected to produce a high system load (as shown in `top` / `htop`) - this is because NpSearch runs SignalP as a separate process for each sequence (to speed things up). As such the system load (which is the number of processes called per unit time) can be higher than expected. This is normally not a reason for concern - however, we will probably try and find the middle ground between the speed and the number of processes called (or maybe someone could rewrite SignalP in C with multicore support)...
data/bin/npsearch CHANGED
@@ -26,7 +26,7 @@ Banner
26
26
  opts.on('-d', '--temp_dir path_to_temp_dir',
27
27
  'The full path to the temp dir. NpSearch will create the folder and',
28
28
  ' then delete the folder once it has finished using them.',
29
- ' Default: Hidden folder in the current working dirctory') do |p|
29
+ ' Default: Hidden folder in the current working directory') do |p|
30
30
  opt[:temp_dir] = p
31
31
  end
32
32
 
@@ -37,17 +37,17 @@ Banner
37
37
  end
38
38
 
39
39
  opt[:min_orf_length] = 30
40
- opts.on('-m', '--min_orf_length N', Integer,
40
+ opts.on('-l', '--min_orf_length N', Integer,
41
41
  'The minimum length of a potential neuropeptide precursor.',
42
42
  ' Default: 30') do |n|
43
43
  opt[:min_orf_length] = n
44
44
  end
45
45
 
46
- opt[:max_seq_length] = 600
47
- opts.on('-m', '--max_seq_length N', Integer,
46
+ opt[:max_orf_length] = 600
47
+ opts.on('-m', '--max_orf_length N', Integer,
48
48
  'The maximum length of a potential neuropeptide precursor.',
49
49
  ' Default: 600') do |n|
50
- opt[:max_seq_length] = n
50
+ opt[:max_orf_length] = n
51
51
  end
52
52
 
53
53
  opts.on('-h', '--help', 'Display this screen') do
File without changes
@@ -0,0 +1,465 @@
1
+ >isotig00001 gene=isogroup00003 length=2185 numContigs=5
2
+ TAGCTGTGATCTAGTGGATCTGACTGGCCTTTTGATTATTTCAGCacGATTCTCAGACTA
3
+ CAGTTGTAAaCCTACTTCGACTACTACTACTActagtacTAACGGTGCAACGTTGTTATA
4
+ AGTTTGCCAAAGGTGAAACTTTAGCCTTAGGACtGTGTTTATTTTATTTGCAGTCGCATT
5
+ CgCCTAACTGTTTTCTGTTACTGGGTGCATTTAACTCACATTAATAGAGGATTTTtGACT
6
+ AGTtCcTAGAGAGTGGTGTTTCTGTTTTACCACCATGGCAAAAAAGGGAAaGCCTCGCCC
7
+ TGACCATAGGCCTCCTGCACACAACCCGCATTATGCTCATGATCCACCACCTTATTCACA
8
+ ACAGCAACCACCACTTCAACAGCAGAACTATGCACAACAAATGCATCATGGTGGAGGTGG
9
+ TGGAAATAGACAACATGCACGACcTAGACCTAGTCCACCTTCAGAAGTCAGTGACTGTGT
10
+ CAAGTACTCCCTTTTCTtGTATAACTGCATCTTTTGGGTAAGTATGCATTCCTCATGACT
11
+ GTTATGTATATGTACGTATTTTAGGTCATCCTGCAAGCAGGAaCTCGCGAAGAAGCcTCA
12
+ TtGGCTTATcAAAGCcGCAAGCTGACCGAAGTCAGTcTcTtAGTTTCATATTtAACGTCC
13
+ ATGATTATGAaTTgTCTATTCTCAACAACTcTGTAACTGGATGACATACATTAATCTTGG
14
+ AGTGACTCGAACAGGGGACCTTATGATTGGAAGGCACCGGCCTTAACTTAACCACTGAGC
15
+ TAACACTCCACATCTTTCAAATTATGTATATAATATATCTTTCAAGATATCTTTCAAATT
16
+ ATACTGATTTGTCTAGTAAGTACAGTACTGTATCACAAACAGTTCAAAACCGACAAAGTG
17
+ CTACACAAACGCAAAGGTTTAAGGTATGGTAGTGTTTGTCTGATGGTATACCTTATCTTT
18
+ TTGGTGATAAGAGCAAAAATGTTCTTTTAATGGTTAAAGTGTAAAGAGGATGTCTTTGTT
19
+ TTtCTGTgAAGTTTAGTTGTAACTTTCAGATACAaGaAAAaGTGAAATGTGCAATGTACT
20
+ GTAAGCTCTCAGAGTTACTCAGTCCTTTAGTTtGCtCTGTGAGATATATGCTGTGAGATA
21
+ TGcTtCAACAGTTCAATTTTCTAACTAAAATTTACATTGGTCATGCAATTTCTTTGTTCG
22
+ TTTGGTTTCTTGTTTTGTTGGTTAGGTTTTGGTGCTTTAAATTACGATGAGGATATATAA
23
+ CAGAGTGTGTTTTCaAACAGCTGGCTGTTATCTGCAGAATCTGGTCACAaCAAGTATACA
24
+ ACCCGCcCGCGTATGGACATATTAATATACCTTTCTCTCATGTGCACTAGAGTTTTTCAT
25
+ TTAGTTACCAAAAAAATCAGTTCTGTGACACATTTTTAGGTTAAAGGTTCAAGGTTGGAG
26
+ AATCCAATAATCATTATACGGTGTGAAGACTCGCGCAAAAAGAACGGCtATGCCgTAATC
27
+ TGACCTaGTTTCGAATGAGGTGTAACAGAAGTGTTAGACACCACCATCGATCCCAGAAAA
28
+ TACACACACAGCTTGCTACCgTCGGTAATTAGACACTAGTGTACAGTCAgTACATACAGC
29
+ TGCAGTCAACACCCACAGCACAGTGTACAAACGGTACAGCGATGGACATCTCAGGTCCAG
30
+ CTAAAGATAACAATGTATCGCGTTTCATTACTGTCTGCATTTTGTAGCGACACGAACAAA
31
+ ACGTCACTTGCAAGCAACAGAAAGTTAACTTTTTCATATGGCTGCATGCGGTTTGGGgCG
32
+ AGTCTTCAGTGCCTTTAAAGTAGATGAAATGGATTGATCTTGAGGAGAAATGCCATCAGG
33
+ tTtCGTTGGCAAACGttCAGGATTTTGTCAGTTTTGCTGTAGTCACATTTAGCAAGATGA
34
+ CGACACAGAAAATATGACGTATAGTACTGCAAAGGAAGGAGCTTATccttTtcGTAATTT
35
+ taattGATtaaGGTtTCAATGCaaGCTTCCATACAGCTTTCAACAGCACATTCAGTTTAA
36
+ AGCAGTATATATGTGAGAACAAAAGGGGTTTTCCCAAAATATTGGgTACCcAAATgggTC
37
+ ACAGCAGACCATAGCAAaCTTTATAAGTGcGCATCttttGACACATATTGAagTGCATAA
38
+ TTtttCTAATAAATTCTTTaaaata
39
+ >isotig00002 gene=isogroup00003 length=1914 numContigs=5
40
+ TGAATGAGAAAtGAAATTTAGCGAAGAAATCACCTTGTAAATTAAAAACTAAAATGGCTT
41
+ TCACACAAATTAaCAGTAAAtGgAGAATGTTTTTAAAGCAATATATGCAGTACAGCcATT
42
+ CATTGGAAAACAGTAAcAAAaTACATTTATCTTGTtcATTTTtACctCctGCAAaacTTA
43
+ cAaCcGTTAATTATGTAGATTGGATGGCACTAACAGGGTACTTGTCTTATCTGCCTATTG
44
+ GATAATGTGGcATTAATACTACTGTGTATGGGCACTGAGGCTGAGAGTGCAGTAAGTTtA
45
+ AAGGCATTGAAGACTCtCCCCGAaCcGCGtGCCGGGCTctGAAAAAGTtAaCTGCTCGCA
46
+ AaTtAcGTTTtCTtCTTGTCaCTaCAAAaTGCAGACATTaaTGAAACGTGATACCTTGTt
47
+ ATCTTTTATCTAGACCTGAGATGTCcAtCGCTGCTATgTACAcTGTGTTGTGGGTATTGA
48
+ CcgTAGCTGTATGTATtGACTGTACACTAGTGTCTAATtACCGACGGTAGCAAGCTGTGT
49
+ GTGTATTTTCTGGGATCaaTGGTGgTGTCTAACACTTCtGTTACACCtCAtTcGAAACTA
50
+ GGTCAGAtTAcCgGCATAGACGTTTCTTTGTGCcgAGTCtTCACAcccttttaaggagaa
51
+ gtattttatatcccattaaTAGAGAAAGAATTATGTTTCAATTGGTAGCATGCAACTTAA
52
+ AAtTTTGCAAACaTTAGATACAATCAAACAATAACAAGGTTCATTACAAGGttAAtCtac
53
+ TGCCCTTTATTtCACATTaGTCTGTCACATCAGAAGgTCACAGCTTtCAaTAaTTATACA
54
+ AACAAATtCCCTtGATGGgTGCTtGgTtAGATTCCTGCaatTTTCAAGTTTATCAATGTA
55
+ ATAAGTTCTGAATGTGGCAATGTGGaaGAAGCtTtGGGATAATCTGTGGATAAGACTGCC
56
+ AGACTATCAACAAGATTCCACATCCATGCAACTCCAACTGCTTCCTTCACCTCCTTGCAG
57
+ ATGTACCACATGTCCaaAAAAAAGTCAGCATTGGTTAATGTAATTAAAATCTGGCTTCCT
58
+ cCTGAGCTGGCAAATACACATGAATTGTCAAGTACAGAGGTCAGTGTGAAACCACTGAAA
59
+ GATCTTCTCAGCTTTCAAGAAAACAAAGACTTGAAGCCAAATTGACAGAGGCCACACTGA
60
+ TACCATTCCACTCTCATAAGATGAAGGTATCACACACACTTCATTTTGCTTCTGCGATGC
61
+ AGTGCCTGGTAGACTGTGAGGgTCACCCAATGGATgtTTTAaCAaCTGCCtGGTTtAtAG
62
+ AGCTCTGCAACAGATAATTCATCCTAaTGTCTAGTCGTCATCCTGTCATGGCCTTGAGCA
63
+ AGTTGAACCCACTTCAACACAAAGCAGCTATTGAATTCTTGTCTAGGTACTGTcAAATCC
64
+ ACATCACCATCATTGCttGGTTCCAGCTaCGcTGACCATGaTAAAAGAGTACAATGAGGG
65
+ TTTTTTAATTCACCCAACAGAGCTTGCATTCCAGTACCTTTGGGCAGCTGaaaaGATATT
66
+ CAGAAaTTGTTATATATGAGTGTGTTTGTATGCATGCAtATGtGTGATTTtCTtGCTTTA
67
+ CAGAACAGCTCCaTTTTGATAAGCTAtgTAAcgtGgAAACCTGCCAATCAaTGTTtgAAa
68
+ taGGAcaGgCTGAAACGATTCTTAAATGAAAAGCTTAAtgaCTTcTTgCAtttttaTACA
69
+ TCACTGTTCAGGtAaGGCCAGTAAGGgCAGTATgAaGAAtAaGTAACAATtAATAATTAT
70
+ CATTATGGCCATTTGCTGtcTGCATAAtAaCAAACTGAATGATGTCATCAGCCCTgTGCT
71
+ CAGTTGACAgAACTGACAAGTAGGCACACaaTGTCAGTGTGATCCATGAAACCT
72
+ >isotig00003 gene=isogroup00003 length=1917 numContigs=7
73
+ TAGCTGTGATCTAGTGGATCTGACTGGCCTTTTGATTATTTCAGCacGATTCTCAGACTA
74
+ CAGTTGTAAaCCTACTTCGACTACTACTACTActagtacTAACGGTGCAACGTTGTTATA
75
+ AGTTTGCCAAAGGTGAAACTTTAGCCTTAGGACtGTGTTTATTTTATTTGCAGTCGCATT
76
+ CgCCTAACTGTTTTCTGTTACTGGGTGCATTTAACTCACATTAATAGAGGATTTTtGACT
77
+ AGTtCcTAGAGAGTGGTGTTTCTGTTTTACCACCATGGCAAAAAAGGGAAaGCCTCGCCC
78
+ TGACCATAGGCCTCCTGCACACAACCCGCATTATGCTCATGATCCACCACCTTATTCACA
79
+ ACAGCAACCACCACTTCAACAGCAGAACTATGCACAACAAATGCATCATGGTGGAGGTGG
80
+ TGGAAATAGACAACATGCACGACcTAGACCTAGTCCACCTTCAGAAGTCAGTGACTGTGT
81
+ CAAGTACTCCCTTTTCTtGTATAACTGCATCTTTTGGGTAAGTATGCATTCCTCATGACT
82
+ GTTATGTATATGTACGTATTTTAGGTCATCCTGCAAGCAGGAaCTCGCGAAGAAGCcTCA
83
+ TtGGCTTATcAAAGCcGCAAGCTGACCGAAGTCAGTcTcTtAGTTTCATATTtAACGTCC
84
+ ATGATTATGAaTTgTCTATTCTCAACAACTcTGTAACTGGATGACATACATTAATCTTGG
85
+ AGTGACTCGAACAGGGGACCTTATGATTGGAAGGCACCGGCCTTAACTTAACCACTGAGC
86
+ TAACACTCCACATCTTTCAAATTATGTATATAATATATCTTTCAAGATATCTTTCAAATT
87
+ ATACTGATTTGTCTAGTAAGTACAGTACTGTATCACAAACAGTTCAAAACCGACAAAGTG
88
+ CTACACAAACGCAAAGGTTTAAGGTATGGTAGTGTTTGTCTGATGGTATACCTTATCTTT
89
+ TTGGTGATAAGAGCAAAAATGTTCTTTTAATGGTTAAAGTGTAAAGAGGATGTCTTTGTT
90
+ TTtCTGTgAAGTTTAGTTGTAACTTTCAGATACAaGaAAAaGTGAAATGTGCAATGTACT
91
+ GTAAGCTCTCAGAGTTACTCAGTCCTTTAGTTtGCtCTGTGAGATATATGCTGTGAGATA
92
+ TGcTtCAACAGTTCAATTTTCTAACTAAAATTTACATTGGTCATGCAATTTCTTTGTTCG
93
+ TTTGGTTTCTTGTTTTGTTGGTTAGGTTTTGGTGCTTTAAATTACGATGAGGATATATAA
94
+ CAGAGTGTGTTTTCaAACAGCTGGCTGTTATCTGCAGAATCTGGTCACAaCAAGTATACA
95
+ ACCCGCcCGCGTATGGACATATTAATATACCTTTCTCTCATGTGCACTAGAGTTTTTCAT
96
+ TTAGTTACCAAAAAAATCAGTTCTGTGACACATTTTTAGGTTAAAGGTTCAAGGTTGGAG
97
+ AATCCAATAATCATTATACGGTGTGAAGACTCGCGCAAAAAGAACGGCtATGCCgTAATC
98
+ TGACCTaGTTTCGAATGAGGTGTAACAGAAGTGTTAGACACCACCATCGATCCCAGAAAA
99
+ TACACACACAGCTTGCTACCgTCGGTAATTAGACACTAGTGTACAGTCAgTACATACAGC
100
+ TaCGGTCAATACCCAcaaaaCaGTGtACaTAGCAGCGaTGGACATcTCAGGTCCAGATAA
101
+ AGATAACAAGGTATCACGTTTCATTACTGTCTGCaTTTTGTAGCgACAaGAAGAAAACTt
102
+ CACTtGCAAGCAACGgAAAGTTAACTTTTtCAGAGCGCGGCACGCGGGTTGGGGCAAGTC
103
+ TTCCAAGCCTTTAAGTtGACAtcTTGCCTTTGGCTATCCAGGgTGACAAGATGATACTAG
104
+ CAGGTAgagtgactaattgagccctgtgtgagaaaccaatgcagaatctagcctagt
105
+ >isotig00004 gene=isogroup00003 length=1896 numContigs=6
106
+ TAGCTGTGATCTAGTGGATCTGACTGGCCTTTTGATTATTTCAGCacGATTCTCAGACTA
107
+ CAGTTGTAAaCCTACTTCGACTACTACTACTActagtacTAACGGTGCAACGTTGTTATA
108
+ AGTTTGCCAAAGGTGAAACTTTAGCCTTAGGACtGTGTTTATTTTATTTGCAGTCGCATT
109
+ CgCCTAACTGTTTTCTGTTACTGGGTGCATTTAACTCACATTAATAGAGGATTTTtGACT
110
+ AGTtCcTAGAGAGTGGTGTTTCTGTTTTACCACCATGGCAAAAAAGGGAAaGCCTCGCCC
111
+ TGACCATAGGCCTCCTGCACACAACCCGCATTATGCTCATGATCCACCACCTTATTCACA
112
+ ACAGCAACCACCACTTCAACAGCAGAACTATGCACAACAAATGCATCATGGTGGAGGTGG
113
+ TGGAAATAGACAACATGCACGACcTAGACCTAGTCCACCTTCAGAAGTCAGTGACTGTGT
114
+ CAAGTACTCCCTTTTCTtGTATAACTGCATCTTTTGGGTAAGTATGCATTCCTCATGACT
115
+ GTTATGTATATGTACGTATTTTAGGTCATCCTGCAAGCAGGAaCTCGCGAAGAAGCcTCA
116
+ TtGGCTTATcAAAGCcGCAAGCTGACCGAAGTCAGTcTcTtAGTTTCATATTtAACGTCC
117
+ ATGATTATGAaTTgTCTATTCTCAACAACTcTGTAACTGGATGACATACATTAATCTTGG
118
+ AGTGACTCGAACAGGGGACCTTATGATTGGAAGGCACCGGCCTTAACTTAACCACTGAGC
119
+ TAACACTCCACATCTTTCAAATTATGTATATAATATATCTTTCAAGATATCTTTCAAATT
120
+ ATACTGATTTGTCTAGTAAGTACAGTACTGTATCACAAACAGTTCAAAACCGACAAAGTG
121
+ CTACACAAACGCAAAGGTTTAAGGTATGGTAGTGTTTGTCTGATGGTATACCTTATCTTT
122
+ TTGGTGATAAGAGCAAAAATGTTCTTTTAATGGTTAAAGTGTAAAGAGGATGTCTTTGTT
123
+ TTtCTGTgAAGTTTAGTTGTAACTTTCAGATACAaGaAAAaGTGAAATGTGCAATGTACT
124
+ GTAAGCTCTCAGAGTTACTCAGTCCTTTAGTTtGCtCTGTGAGATATATGCTGTGAGATA
125
+ TGcTtCAACAGTTCAATTTTCTAACTAAAATTTACATTGGTCATGCAATTTCTTTGTTCG
126
+ TTTGGTTTCTTGTTTTGTTGGTTAGGTTTTGGTGCTTTAAATTACGATGAGGATATATAA
127
+ CAGAGTGTGTTTTCaAACAGCTGGCTGTTATCTGCAGAATCTGGTCACAaCAAGTATACA
128
+ ACCCGCcCGCGTATGGACATATTAATATACCTTTCTCTCATGTGCACTAGAGTTTTTCAT
129
+ TTAGTTACCAAAAAAATCAGTTCTGTGACACATTTTTAGGTTAAAGGTTCAAGGTTGGAG
130
+ AATCCAATAATCATTATACGGTGTGAAGACTCGCGCAAAAAGAACGGCtATGCCgTAATC
131
+ TGACCTaGTTTCGAATGAGGTGTAACAGAAGTGTTAGACACCACCATCGATCCCAGAAAA
132
+ TACACACACAGCTTGCTACCgTCGGTAATTAGACACTAGTGTACAGTCAgTACATACAGC
133
+ TaCGGTCAATACCCAcaaaaCaGTGtACaTAGCAGCGaTGGACATcTCAGGTCCAGATAA
134
+ AGATAACAAGGTATCACGTTTCATTACTGTCTGCaTTTTGTAGCgACAaGAAGAAAACTt
135
+ CACTtGCAAGCAACGgAAAGTTAACTTTTtCAGAGGGCAGCACTTGGTTTGGAGCGAATC
136
+ TTCAATGCCTTTAAGTCATCCTTTACTAGATGGAAGCTCTTCTTATGTAGTTTACTCttc
137
+ ATACTATCAAGACATTCTTAATGATATACTATGCTT
138
+ >isotig00005 gene=isogroup00003 length=1789 numContigs=6
139
+ ACATTCTTCAAGAGCTCTGCACCCACCAATCTAAAGTGACCAGCCAAGTGACTGACCTCA
140
+ GGGCACAGTTAGCAGCTTTGACCACAGGATGAGCTATGTAACAACTGAAtgaaTGGTGTT
141
+ CAtcGTTGATTGGGCAgTCAAAACAGCTGAATTTCTCTTGCGgAAGACATAAAGGCATTG
142
+ AAGACtcGCCcAAaccGtGTGcgcccTCTGAAAAaGTTAACTTTctGTTgCTTGCAaGTG
143
+ AAGTTTtcTtCTtGTCgCTACAAAATGCAGACAGTAaTgAAACGTGATACcTtGTtATCT
144
+ TTtATCTAgACctGAGATGtCcACGCTGCTATGTACACTGTGTTGTGGgTATTGACcGTA
145
+ GCTGTATGTATtGACTGTACACTAGTGTCTAATtACCGACGGTAGCAAGCTGTGTGTGTA
146
+ TTTTCTGGGATCaaTGGTGgTGTCTAACACTTCtGTTACACCtCAtTcGAAACTAGGTCA
147
+ GAtTAcCgGCATAGACGTTTCTTTGTGCcgAGTCtTCACAcccttttaaggagaagtatt
148
+ ttatatcccattaaTAGAGAAAGAATTATGTTTCAATTGGTAGCATGCAACTTAAAAtTT
149
+ TGCAAACaTTAGATACAATCAAACAATAACAAGGTTCATTACAAGGttAAtCtacTGCCC
150
+ TTTATTtCACATTaGTCTGTCACATCAGAAGgTCACAGCTTtCAaTAaTTATACAAACAA
151
+ ATtCCCTtGATGGgTGCTtGgTtAGATTCCTGCaatTTTCAAGTTTATCAATGTAATAAG
152
+ TTCTGAATGTGGCAATGTGGaaGAAGCtTtGGGATAATCTGTGGATAAGACTGCCAGACT
153
+ ATCAACAAGATTCCACATCCATGCAACTCCAACTGCTTCCTTCACCTCCTTGCAGATGTA
154
+ CCACATGTCCaaAAAAAAGTCAGCATTGGTTAATGTAATTAAAATCTGGCTTCCTcCTGA
155
+ GCTGGCAAATACACATGAATTGTCAAGTACAGAGGTCAGTGTGAAACCACTGAAAGATCT
156
+ TCTCAGCTTTCAAGAAAACAAAGACTTGAAGCCAAATTGACAGAGGCCACACTGATACCA
157
+ TTCCACTCTCATAAGATGAAGGTATCACACACACTTCATTTTGCTTCTGCGATGCAGTGC
158
+ CTGGTAGACTGTGAGGgTCACCCAATGGATgtTTTAaCAaCTGCCtGGTTtAtAGAGCTC
159
+ TGCAACAGATAATTCATCCTAaTGTCTAGTCGTCATCCTGTCATGGCCTTGAGCAAGTTG
160
+ AACCCACTTCAACACAAAGCAGCTATTGAATTCTTGTCTAGGTACTGTcAAATCCACATC
161
+ ACCATCATTGCttGGTTCCAGCTaCGcTGACCATGaTAAAAGAGTACAATGAGGGTTTTT
162
+ TAATTCACCCAACAGAGCTTGCATTCCAGTACCTTTGGGCAGCTGaaaaGATATTCAGAA
163
+ aTTGTTATATATGAGTGTGTTTGTATGCATGCAtATGtGTGATTTtCTtGCTTTACAGAA
164
+ CAGCTCCaTTTTGATAAGCTAtgTAAcgtGgAAACCTGCCAATCAaTGTTtgAAataGGA
165
+ caGgCTGAAACGATTCTTAAATGAAAAGCTTAAtgaCTTcTTgCAtttttaTACATCACT
166
+ GTTCAGGtAaGGCCAGTAAGGgCAGTATgAaGAAtAaGTAACAATtAATAATTATCATTA
167
+ TGGCCATTTGCTGtcTGCATAAtAaCAAACTGAATGATGTCATCAGCCCTgTGCTCAGTT
168
+ GACAgAACTGACAAGTAGGCACACaaTGTCAGTGTGATCCATGAAACCT
169
+ >isotig00006 gene=isogroup00003 length=1747 numContigs=6
170
+ AGTTAAAAGTTGAAAAATTGGTGACCATATTTTGACACTCTAGCATATTTGGGAGCTATA
171
+ TACTGATTTGGGTTTCACCATGCACAGATGAGGTATATACATAAGTTGAAAGCCTGCAGC
172
+ TCTATATTAAAGGCATTGAAGACtcGCCcAAaccgtgTGcgcccTCTGAAAAaGTTAACT
173
+ TTCcGTTgCTTGCAaGTGAAGTTTtcTtCTTGTCGCTACAAAATGCAGACAGTAATGAAA
174
+ CGTGATACcTtGTtATCTTTtATCTAgACcTGAGATGtCcACGCTGCTATGTACACTGTG
175
+ TTGTGGgTATTGACcGTAGCTGTATGTATtGACTGTACACTAGTGTCTAATtACCGACGG
176
+ TAGCAAGCTGTGTGTGTATTTTCTGGGATCaaTGGTGgTGTCTAACACTTCtGTTACACC
177
+ tCAtTcGAAACTAGGTCAGAtTAcCgGCATAGACGTTTCTTTGTGCcgAGTCtTCACAcc
178
+ cttttaaggagaagtattttatatcccattaaTAGAGAAAGAATTATGTTTCAATTGGTA
179
+ GCATGCAACTTAAAAtTTTGCAAACaTTAGATACAATCAAACAATAACAAGGTTCATTAC
180
+ AAGGttAAtCtacTGCCCTTTATTtCACATTaGTCTGTCACATCAGAAGgTCACAGCTTt
181
+ CAaTAaTTATACAAACAAATtCCCTtGATGGgTGCTtGgTtAGATTCCTGCaatTTTCAA
182
+ GTTTATCAATGTAATAAGTTCTGAATGTGGCAATGTGGaaGAAGCtTtGGGATAATCTGT
183
+ GGATAAGACTGCCAGACTATCAACAAGATTCCACATCCATGCAACTCCAACTGCTTCCTT
184
+ CACCTCCTTGCAGATGTACCACATGTCCaaAAAAAAGTCAGCATTGGTTAATGTAATTAA
185
+ AATCTGGCTTCCTcCTGAGCTGGCAAATACACATGAATTGTCAAGTACAGAGGTCAGTGT
186
+ GAAACCACTGAAAGATCTTCTCAGCTTTCAAGAAAACAAAGACTTGAAGCCAAATTGACA
187
+ GAGGCCACACTGATACCATTCCACTCTCATAAGATGAAGGTATCACACACACTTCATTTT
188
+ GCTTCTGCGATGCAGTGCCTGGTAGACTGTGAGGgTCACCCAATGGATgtTTTAaCAaCT
189
+ GCCtGGTTtAtAGAGCTCTGCAACAGATAATTCATCCTAaTGTCTAGTCGTCATCCTGTC
190
+ ATGGCCTTGAGCAAGTTGAACCCACTTCAACACAAAGCAGCTATTGAATTCTTGTCTAGG
191
+ TACTGTcAAATCCACATCACCATCATTGCttGGTTCCAGCTaCGcTGACCATGaTAAAAG
192
+ AGTACAATGAGGGTTTTTTAATTCACCCAACAGAGCTTGCATTCCAGTACCTTTGGGCAG
193
+ CTGaaaaGATATTCAGAAaTTGTTATATATGAGTGTGTTTGTATGCATGCAtATGtGTGA
194
+ TTTtCTtGCTTTACAGAACAGCTCCaTTTTGATAAGCTAtgTAAcgtGgAAACCTGCCAA
195
+ TCAaTGTTtgAAataGGAcaGgCTGAAACGATTCTTAAATGAAAAGCTTAAtgaCTTcTT
196
+ gCAtttttaTACATCACTGTTCAGGtAaGGCCAGTAAGGgCAGTATgAaGAAtAaGTAAC
197
+ AATtAATAATTATCATTATGGCCATTTGCTGtcTGCATAAtAaCAAACTGAATGATGTCA
198
+ TCAGCCCTgTGCTCAGTTGACAgAACTGACAAGTAGGCACACaaTGTCAGTGTGATCCAT
199
+ GAAACCT
200
+ >isotig00007 gene=isogroup00003 length=1749 numContigs=5
201
+ TGTGTGTGTGTGGTGCTTCCccTCTAGGGCTGTAAATTTCAAAGGAACCTTGCGCAAGAA
202
+ CAGtAGCTTGCGaCGTTTTTCAAaaCCAGAGGTTCTGAACTGAACTGTACTGACTACTGT
203
+ AGGGtacTTAAaGGCATTGAAGACTCGCCcAAaCCatgTGCCGCGctttGAAAAAGTTAA
204
+ CTTTCCGTTGCTTGCAAATGAcGTTTtcTtCTtGTCgCTACAAAATGCAGACAGTAaTgA
205
+ AACGTGATACcTtGTtATCTTTtATCTAgACctGAGATGtCcACGCTGCTATGTACACTG
206
+ TGTTGTGGgTATTGACcGTAGCTGTATGTATtGACTGTACACTAGTGTCTAATtACCGAC
207
+ GGTAGCAAGCTGTGTGTGTATTTTCTGGGATCaaTGGTGgTGTCTAACACTTCtGTTACA
208
+ CCtCAtTcGAAACTAGGTCAGAtTAcCgGCATAGACGTTTCTTTGTGCcgAGTCtTCACA
209
+ cccttttaaggagaagtattttatatcccattaaTAGAGAAAGAATTATGTTTCAATTGG
210
+ TAGCATGCAACTTAAAAtTTTGCAAACaTTAGATACAATCAAACAATAACAAGGTTCATT
211
+ ACAAGGttAAtCtacTGCCCTTTATTtCACATTaGTCTGTCACATCAGAAGgTCACAGCT
212
+ TtCAaTAaTTATACAAACAAATtCCCTtGATGGgTGCTtGgTtAGATTCCTGCaatTTTC
213
+ AAGTTTATCAATGTAATAAGTTCTGAATGTGGCAATGTGGaaGAAGCtTtGGGATAATCT
214
+ GTGGATAAGACTGCCAGACTATCAACAAGATTCCACATCCATGCAACTCCAACTGCTTCC
215
+ TTCACCTCCTTGCAGATGTACCACATGTCCaaAAAAAAGTCAGCATTGGTTAATGTAATT
216
+ AAAATCTGGCTTCCTcCTGAGCTGGCAAATACACATGAATTGTCAAGTACAGAGGTCAGT
217
+ GTGAAACCACTGAAAGATCTTCTCAGCTTTCAAGAAAACAAAGACTTGAAGCCAAATTGA
218
+ CAGAGGCCACACTGATACCATTCCACTCTCATAAGATGAAGGTATCACACACACTTCATT
219
+ TTGCTTCTGCGATGCAGTGCCTGGTAGACTGTGAGGgTCACCCAATGGATgtTTTAaCAa
220
+ CTGCCtGGTTtAtAGAGCTCTGCAACAGATAATTCATCCTAaTGTCTAGTCGTCATCCTG
221
+ TCATGGCCTTGAGCAAGTTGAACCCACTTCAACACAAAGCAGCTATTGAATTCTTGTCTA
222
+ GGTACTGTcAAATCCACATCACCATCATTGCttGGTTCCAGCTaCGcTGACCATGaTAAA
223
+ AGAGTACAATGAGGGTTTTTTAATTCACCCAACAGAGCTTGCATTCCAGTACCTTTGGGC
224
+ AGCTGaaaaGATATTCAGAAaTTGTTATATATGAGTGTGTTTGTATGCATGCAtATGtGT
225
+ GATTTtCTtGCTTTACAGAACAGCTCCaTTTTGATAAGCTAtgTAAcgtGgAAACCTGCC
226
+ AATCAaTGTTtgAAataGGAcaGgCTGAAACGATTCTTAAATGAAAAGCTTAAtgaCTTc
227
+ TTgCAtttttaTACATCACTGTTCAGGtAaGGCCAGTAAGGgCAGTATgAaGAAtAaGTA
228
+ ACAATtAATAATTATCATTATGGCCATTTGCTGtcTGCATAAtAaCAAACTGAATGATGT
229
+ CATCAGCCCTgTGCTCAGTTGACAgAACTGACAAGTAGGCACACaaTGTCAGTGTGATCC
230
+ ATGAAACCT
231
+ >isotig00008 gene=isogroup00003 length=1726 numContigs=6
232
+ AGGTTTCATGGATCACACTGACAtTGTGTGCCTACTTGTCAGTTcTGTCAACTGAGCAcA
233
+ GGGCTGATGACATCATTCAGTTTGttattATGCAggaCAGCAAATGGCCATAATGATAAT
234
+ TATTAaTTGTTACTtaTTCTtcATACTGCCcTTACTGGCCTtaCCTGAACAGTGATGTAt
235
+ caaaaTGcAAgAAGtcaTTAAGCTTTTCATTTAAGAATCGTTTCAGCctgTCCtaatTTt
236
+ cAAaCAtTGATTGGCAGGTTTCcacgTTAcaTAGCTTATCAAAAtGGAGCTGTTCTGTAA
237
+ AGCAAGaAAATCACaCATaTGCATGCATACAAACACACTCATATATAACAAtTTCTGAAT
238
+ ATCTtttCAGCTGCCCAAAGGTACTGGAATGCAAGCTCTGTTGGGTGAATTAAAAAaCCc
239
+ TCATTGTACTCTTTTATCATGGTCAGCGTAGCTGGAACCAGCAATGATGGTGATGTGGAT
240
+ TTGACAGTACCTAGACAAGAATTCAATAGCTGCTTTGTGTTGAAGTGGGTTCAACTTGCT
241
+ CAAGGCCATGACAGGATGACGACTAGACATtAGGATGAATTATCTGTTGCAGAGCTCTAT
242
+ AAaCCAGGCAGTtGTtAAAaCATCCATTGGGTGACCcTCACAGTCTACCAGGCACTGCAT
243
+ CGCAGAAGCAAAATGAAGTGTGTGTgATACCTTCATCTTATGAGAGTGGAATGGTATCAG
244
+ TGTGGCCTCTGTCAATTTGGCTTCAAGTCTTTGTTTTCTTGAAAGCTGAGAaGATCTTTC
245
+ AGTGGTTTCACACTGACCTCTGTACTTGACAATTCATGTGTATTTGCCAGCTCAGgAGGA
246
+ AGCCAGATTTTAATTACATTAACCAATGCTGACTTTTTTttGGACATGTGGTACATCTGC
247
+ AAGGAGGTGAAGGAAGCAGTTGGAGTTGCATGGATGTGGAATCTTGTTGATAGTCTGGCA
248
+ GTCTTATCCACAGATTATCCCAAAGCTTCTCCACATTGCCACATTCAGAACTTATTACAT
249
+ TGATAAACTTGAAAATtGCAGGAATCTAaCcAaGCACCcATCAaGGGAaTTTGTTTGTAT
250
+ AATtATtGAAaGCTGTGACcTTCTGATGTGACAGACTAATGTGAAaTAAAGGgCAgtaGa
251
+ TTaCCTTGTaaTGAACCttGTTATTGTTTGATTGTATCTAAtGTTTGCAaaTTTTAAGTT
252
+ GCATGCTACCAATTGAAACATAATTCTTTCTCTAttaatgggatataaaatacttctcct
253
+ taaaagggTGTgAaGACTcggCACAAAGAAACGTCtaTGCcGgtAaTCTGACCTAGTTTc
254
+ gAatGaGGTGTAACagAAGTgTtAGACACcACCAttGATCCcAGAAAATACACACACAGC
255
+ TTGCTACCGTCGGTAaTTAGACACTAGTGTACAGTCAaTACATACAGCTAcGgTCAATAC
256
+ CCACAaCACAgTGTAcATAGCAGCGaTGgACATCTCAGGTCTAGATAAAAGATAaCAAGG
257
+ TATCACGTTTCATtaCTGTCTGCATTTtGTAGCgaCAagAAGAAAAcgtCATTtGCAAGC
258
+ AaTGgAAAGTtAACTTTTTCaGAGCGcagCAcGCgggTTGGGGCAAGTCTTCCAAGCCTT
259
+ TAAGTtGACAtcTTGCCTTTGGCTATCCAGGgTGACAAGATGATACTAGCAGGTAgagtg
260
+ actaattgagccctgtgtgagaaaccaatgcagaatctagcctagt
261
+ >isotig00009 gene=isogroup00003 length=1827 numContigs=2
262
+ TAGCTGTGATCTAGTGGATCTGACTGGCCTTTTGATTATTTCAGCacGATTCTCAGACTA
263
+ CAGTTGTAAaCCTACTTCGACTACTACTACTActagtacTAACGGTGCAACGTTGTTATA
264
+ AGTTTGCCAAAGGTGAAACTTTAGCCTTAGGACtGTGTTTATTTTATTTGCAGTCGCATT
265
+ CgCCTAACTGTTTTCTGTTACTGGGTGCATTTAACTCACATTAATAGAGGATTTTtGACT
266
+ AGTtCcTAGAGAGTGGTGTTTCTGTTTTACCACCATGGCAAAAAAGGGAAaGCCTCGCCC
267
+ TGACCATAGGCCTCCTGCACACAACCCGCATTATGCTCATGATCCACCACCTTATTCACA
268
+ ACAGCAACCACCACTTCAACAGCAGAACTATGCACAACAAATGCATCATGGTGGAGGTGG
269
+ TGGAAATAGACAACATGCACGACcTAGACCTAGTCCACCTTCAGAAGTCAGTGACTGTGT
270
+ CAAGTACTCCCTTTTCTtGTATAACTGCATCTTTTGgaTTGtCGGCCTTttCTTTATtGC
271
+ AGCAGGTATCTGGgCATTTCACGATAGGGGTGTTTTTAATGAATTCCAGTCACTTAGTAC
272
+ CAATGAGGTCTCCTTTCTCACTGATCCTGTTATTTGGCTGTTCGTCCTCGGAGGTGTAGT
273
+ TTTCATGCTGGGAACCCTCGGATGTCTgGGGgCCCTCAGAGAAAaTATCTGCATGCTGAA
274
+ GTGTTTTAGCATAATCATGGGGCTTATACTGCTGCTGGAAATTGGAGGTGGATGTGCGAT
275
+ ATACTTCTATCGTGCACAGATTCAGGCACAGTTTCAAAAGTCCTTAACAGATGTGaCCAT
276
+ AACAGATTACAGAGAAAATGCTGATTTCCAGGATCTCATAGACGCATTACAATCCGGTCT
277
+ TTCTTGTTGTGGTGTCAATTCCTatGAAGACTGGGATAATAATATTTATTTCAACTGTAG
278
+ TGGTCCTGCCAATAACCCTGAAGCcttGTGGTGTGCCTTtCTCCTGTTGTATACCGGATC
279
+ AAGCAAGCGGAGTAGCCAACACCCAGTGCGGTTATGGAGTTCGTTCCCCCGAACAACAAA
280
+ ATACTTTCCACACAAAGATTTACACCACTGGCTGTGCGGATATGTTTACAATGTGGATTA
281
+ ATAGGTACCTATATTACATAGCAGGCATTGCTGGGGTCATTGTCTTGGTCGAGTtGTTTG
282
+ GATTCTGTTTTGCACATTCCCTCATCAACGACATCAAACGCCAAAAGGCCCGCTGGGCGC
283
+ ATCGATAATTCATTCCAGGATGTTGGTGgATGATGCTACTCAAGGGagAAGACTGACAGT
284
+ GCCTTTtGGTCAaTATCGTGTAGCATCAGGAAGGAGGTAGTACCTCCTCAACTAACCaTA
285
+ ACAGAATTTGTCCAGTTTGTAACATCGTCAAGAAATAAACAGACTTTTTTTACCATTAGG
286
+ ACgTGATAATACTACCACGTAACCTCTCAAAGCACAAAAAGCAAAAAGCAAATATCTCCT
287
+ TGTTTTAAAATTAGaagGTCTATCTCAGATAACAACCACAGAACATgTGGAGTTTTCCtT
288
+ TATGCTATCATAAAGATATAAATATATATAAAATTGAGGTAGcATCtTGGCTACCCACCA
289
+ AAATCATTTTTTTTCCAGTTTGaAACATCATGGAACATTTCAGAACAAAGATCATTTCAG
290
+ TCGTTACCACACTCAAGAgaTTGCTGTcGTCAaCaTTTtGtaGCTTTTtAAtGTCTTGAT
291
+ CTTCGTCGACATCGTCAATGTGTAAACTATTCTCGACGAGAGATTAGTGTCTAATACTGC
292
+ GGGTgATTTGATATAAATCTCACTTGG
293
+ >isotig00010 gene=isogroup00003 length=1650 numContigs=5
294
+ TGAATGAGAAAtGAAATTTAGCGAAGAAATCACCTTGTAAATTAAAAACTAAAATGGCTT
295
+ TCACACAAATTAaCAGTAAAtGgAGAATGTTTTTAAAGCAATATATGCAGTACAGCcATT
296
+ CATTGGAAAACAGTAAcAAAaTACATTTATCTTGTtcATTTTtACctCctGCAAaacTTA
297
+ cAaCcGTTAATTATGTAGATTGGATGGCACTAACAGGGTACTTGTCTTATCTGCCTATTG
298
+ GATAATGTGGcATTAATACTACTGTGTATGGGCACTGAGGCTGAGAGTGCAGTAAGTTtA
299
+ AAGGCATTGAAGACTCtCCCCGAaCcGCGtGCCGGGCTctGAAAAAGTtAaCTGCTCGCA
300
+ AaTtAcGTTTtCTtCTTGTCaCTaCAAAaTGCAGACATTaaTGAAACGTGATACCTTGTt
301
+ ATCTTTTATCTAGACCTGAGATGTCcAtCGCTGCTATgTACAcTGTGTTGTGGGTATTGA
302
+ CcgTAGCTGTATGTATtGACTGTACACTAGTGTCTAATtACCGACGGTAGCAAGCTGTGT
303
+ GTGTATTTTCTGGGATCaaTGGTGgTGTCTAACACTTCtGTTACACCtCAtTcGAAACTA
304
+ GGTCAGAtTAcCgGCATAGACGTTTCTTTGTGCcgAGTCtTCACAcccttttaaggagaa
305
+ gtattttatatcccattaaTAGAGAAAGAATTATGTTTCAATTGGTAGCATGCAACTTAA
306
+ AAtTTTGCAAACaTTAGATACAATCAAACAATAACAAGGTTCATTACAAGGttAAtCtac
307
+ TGCCCTTTATTtCACATTaGTCTGTCACATCAGAAGgTCACAGCTTtCAaTAaTTATACA
308
+ AACAAATtCCCTtGATGGgTGCTtGgTtAGATTCCTGCaatTTTCAAGTTTATCAATGTA
309
+ ATAAGTTCTGAATGTGGCAATGTGGaaGAAGCtTtGGGATAATCTGTGGATAAGACTGCC
310
+ AGACTATCAACAAGATTCCACATCCATGCAACTCCAACTGCTTCCTTCACCTCCTTGCAG
311
+ ATGTACCACATGTCCaaAAAAAAGTCAGCATTGGTTAATGTAATTAAAATCTGGCTTCCT
312
+ cCTGAGCTGGCAAATACACATGAATTGTCAAGTACAGAGGTCAGTGTGAAACCACTGAAA
313
+ GATCTTCTCAGCTTTCAAGAAAACAAAGACTTGAAGCCAAATTGACAGAGGCCACACTGA
314
+ TACCATTCCACTCTCATAAGATGAAGGTATCACACACACTTCATTTTGCTTCTGCGATGC
315
+ AGTGCCTGGTAGACTGTGAGGgTCACCCAATGGATgtTTTAaCAaCTGCCtGGTTtAtAG
316
+ AGCTCTGCAACAGATAATTCATCCTAaTGTCTAGTCGTCATCCTGTCATGGCCTTGAGCA
317
+ AGTTGAACCCACTTCAACACAAAGCAGCTATTGAATTCTTGTCTAGGTACTGTcAAATCC
318
+ ACATCACCATCATTGCttGGTTCCAGCTaCGcTGACCATGaTAAAAGAGTACAATGAGGG
319
+ TTTTTTAATTCACCCAACAGAGCTTGCATTCCAGTACCTTTGGGCAGCTGATATCCATTT
320
+ TGTTCCTCGTATgCCTGTCAAAATCTGACATTctGagTCGCTTCGTTTGTTCGCAACGAG
321
+ CACAGTGTGCAAAGctGCTATATATTGTCC
322
+ >isotig00011 gene=isogroup00003 length=1525 numContigs=6
323
+ ACATTCTTCAAGAGCTCTGCACCCACCAATCTAAAGTGACCAGCCAAGTGACTGACCTCA
324
+ GGGCACAGTTAGCAGCTTTGACCACAGGATGAGCTATGTAACAACTGAAtgaaTGGTGTT
325
+ CAtcGTTGATTGGGCAgTCAAAACAGCTGAATTTCTCTTGCGgAAGACATAAAGGCATTG
326
+ AAGACtcGCCcAAaccGtGTGcgcccTCTGAAAAaGTTAACTTTctGTTgCTTGCAaGTG
327
+ AAGTTTtcTtCTtGTCgCTACAAAATGCAGACAGTAaTgAAACGTGATACcTtGTtATCT
328
+ TTtATCTAgACctGAGATGtCcACGCTGCTATGTACACTGTGTTGTGGgTATTGACcGTA
329
+ GCTGTATGTATtGACTGTACACTAGTGTCTAATtACCGACGGTAGCAAGCTGTGTGTGTA
330
+ TTTTCTGGGATCaaTGGTGgTGTCTAACACTTCtGTTACACCtCAtTcGAAACTAGGTCA
331
+ GAtTAcCgGCATAGACGTTTCTTTGTGCcgAGTCtTCACAcccttttaaggagaagtatt
332
+ ttatatcccattaaTAGAGAAAGAATTATGTTTCAATTGGTAGCATGCAACTTAAAAtTT
333
+ TGCAAACaTTAGATACAATCAAACAATAACAAGGTTCATTACAAGGttAAtCtacTGCCC
334
+ TTTATTtCACATTaGTCTGTCACATCAGAAGgTCACAGCTTtCAaTAaTTATACAAACAA
335
+ ATtCCCTtGATGGgTGCTtGgTtAGATTCCTGCaatTTTCAAGTTTATCAATGTAATAAG
336
+ TTCTGAATGTGGCAATGTGGaaGAAGCtTtGGGATAATCTGTGGATAAGACTGCCAGACT
337
+ ATCAACAAGATTCCACATCCATGCAACTCCAACTGCTTCCTTCACCTCCTTGCAGATGTA
338
+ CCACATGTCCaaAAAAAAGTCAGCATTGGTTAATGTAATTAAAATCTGGCTTCCTcCTGA
339
+ GCTGGCAAATACACATGAATTGTCAAGTACAGAGGTCAGTGTGAAACCACTGAAAGATCT
340
+ TCTCAGCTTTCAAGAAAACAAAGACTTGAAGCCAAATTGACAGAGGCCACACTGATACCA
341
+ TTCCACTCTCATAAGATGAAGGTATCACACACACTTCATTTTGCTTCTGCGATGCAGTGC
342
+ CTGGTAGACTGTGAGGgTCACCCAATGGATgtTTTAaCAaCTGCCtGGTTtAtAGAGCTC
343
+ TGCAACAGATAATTCATCCTAaTGTCTAGTCGTCATCCTGTCATGGCCTTGAGCAAGTTG
344
+ AACCCACTTCAACACAAAGCAGCTATTGAATTCTTGTCTAGGTACTGTcAAATCCACATC
345
+ ACCATCATTGCttGGTTCCAGCTaCGcTGACCATGaTAAAAGAGTACAATGAGGGTTTTT
346
+ TAATTCACCCAACAGAGCTTGCATTCCAGTACCTTTGGGCAGCTGATATCCATTTTGTTC
347
+ CTCGTATgCCTGTCAAAATCTGACATTctGagTCGCTTCGTTTGTTCGCAACGAGCACAG
348
+ TGTGCAAAGctGCTATATATTGTCC
349
+ >isotig00012 gene=isogroup00003 length=1483 numContigs=6
350
+ AGTTAAAAGTTGAAAAATTGGTGACCATATTTTGACACTCTAGCATATTTGGGAGCTATA
351
+ TACTGATTTGGGTTTCACCATGCACAGATGAGGTATATACATAAGTTGAAAGCCTGCAGC
352
+ TCTATATTAAAGGCATTGAAGACtcGCCcAAaccgtgTGcgcccTCTGAAAAaGTTAACT
353
+ TTCcGTTgCTTGCAaGTGAAGTTTtcTtCTTGTCGCTACAAAATGCAGACAGTAATGAAA
354
+ CGTGATACcTtGTtATCTTTtATCTAgACcTGAGATGtCcACGCTGCTATGTACACTGTG
355
+ TTGTGGgTATTGACcGTAGCTGTATGTATtGACTGTACACTAGTGTCTAATtACCGACGG
356
+ TAGCAAGCTGTGTGTGTATTTTCTGGGATCaaTGGTGgTGTCTAACACTTCtGTTACACC
357
+ tCAtTcGAAACTAGGTCAGAtTAcCgGCATAGACGTTTCTTTGTGCcgAGTCtTCACAcc
358
+ cttttaaggagaagtattttatatcccattaaTAGAGAAAGAATTATGTTTCAATTGGTA
359
+ GCATGCAACTTAAAAtTTTGCAAACaTTAGATACAATCAAACAATAACAAGGTTCATTAC
360
+ AAGGttAAtCtacTGCCCTTTATTtCACATTaGTCTGTCACATCAGAAGgTCACAGCTTt
361
+ CAaTAaTTATACAAACAAATtCCCTtGATGGgTGCTtGgTtAGATTCCTGCaatTTTCAA
362
+ GTTTATCAATGTAATAAGTTCTGAATGTGGCAATGTGGaaGAAGCtTtGGGATAATCTGT
363
+ GGATAAGACTGCCAGACTATCAACAAGATTCCACATCCATGCAACTCCAACTGCTTCCTT
364
+ CACCTCCTTGCAGATGTACCACATGTCCaaAAAAAAGTCAGCATTGGTTAATGTAATTAA
365
+ AATCTGGCTTCCTcCTGAGCTGGCAAATACACATGAATTGTCAAGTACAGAGGTCAGTGT
366
+ GAAACCACTGAAAGATCTTCTCAGCTTTCAAGAAAACAAAGACTTGAAGCCAAATTGACA
367
+ GAGGCCACACTGATACCATTCCACTCTCATAAGATGAAGGTATCACACACACTTCATTTT
368
+ GCTTCTGCGATGCAGTGCCTGGTAGACTGTGAGGgTCACCCAATGGATgtTTTAaCAaCT
369
+ GCCtGGTTtAtAGAGCTCTGCAACAGATAATTCATCCTAaTGTCTAGTCGTCATCCTGTC
370
+ ATGGCCTTGAGCAAGTTGAACCCACTTCAACACAAAGCAGCTATTGAATTCTTGTCTAGG
371
+ TACTGTcAAATCCACATCACCATCATTGCttGGTTCCAGCTaCGcTGACCATGaTAAAAG
372
+ AGTACAATGAGGGTTTTTTAATTCACCCAACAGAGCTTGCATTCCAGTACCTTTGGGCAG
373
+ CTGATATCCATTTTGTTCCTCGTATgCCTGTCAAAATCTGACATTctGagTCGCTTCGTT
374
+ TGTTCGCAACGAGCACAGTGTGCAAAGctGCTATATATTGTCC
375
+ >isotig00013 gene=isogroup00003 length=1485 numContigs=5
376
+ TGTGTGTGTGTGGTGCTTCCccTCTAGGGCTGTAAATTTCAAAGGAACCTTGCGCAAGAA
377
+ CAGtAGCTTGCGaCGTTTTTCAAaaCCAGAGGTTCTGAACTGAACTGTACTGACTACTGT
378
+ AGGGtacTTAAaGGCATTGAAGACTCGCCcAAaCCatgTGCCGCGctttGAAAAAGTTAA
379
+ CTTTCCGTTGCTTGCAAATGAcGTTTtcTtCTtGTCgCTACAAAATGCAGACAGTAaTgA
380
+ AACGTGATACcTtGTtATCTTTtATCTAgACctGAGATGtCcACGCTGCTATGTACACTG
381
+ TGTTGTGGgTATTGACcGTAGCTGTATGTATtGACTGTACACTAGTGTCTAATtACCGAC
382
+ GGTAGCAAGCTGTGTGTGTATTTTCTGGGATCaaTGGTGgTGTCTAACACTTCtGTTACA
383
+ CCtCAtTcGAAACTAGGTCAGAtTAcCgGCATAGACGTTTCTTTGTGCcgAGTCtTCACA
384
+ cccttttaaggagaagtattttatatcccattaaTAGAGAAAGAATTATGTTTCAATTGG
385
+ TAGCATGCAACTTAAAAtTTTGCAAACaTTAGATACAATCAAACAATAACAAGGTTCATT
386
+ ACAAGGttAAtCtacTGCCCTTTATTtCACATTaGTCTGTCACATCAGAAGgTCACAGCT
387
+ TtCAaTAaTTATACAAACAAATtCCCTtGATGGgTGCTtGgTtAGATTCCTGCaatTTTC
388
+ AAGTTTATCAATGTAATAAGTTCTGAATGTGGCAATGTGGaaGAAGCtTtGGGATAATCT
389
+ GTGGATAAGACTGCCAGACTATCAACAAGATTCCACATCCATGCAACTCCAACTGCTTCC
390
+ TTCACCTCCTTGCAGATGTACCACATGTCCaaAAAAAAGTCAGCATTGGTTAATGTAATT
391
+ AAAATCTGGCTTCCTcCTGAGCTGGCAAATACACATGAATTGTCAAGTACAGAGGTCAGT
392
+ GTGAAACCACTGAAAGATCTTCTCAGCTTTCAAGAAAACAAAGACTTGAAGCCAAATTGA
393
+ CAGAGGCCACACTGATACCATTCCACTCTCATAAGATGAAGGTATCACACACACTTCATT
394
+ TTGCTTCTGCGATGCAGTGCCTGGTAGACTGTGAGGgTCACCCAATGGATgtTTTAaCAa
395
+ CTGCCtGGTTtAtAGAGCTCTGCAACAGATAATTCATCCTAaTGTCTAGTCGTCATCCTG
396
+ TCATGGCCTTGAGCAAGTTGAACCCACTTCAACACAAAGCAGCTATTGAATTCTTGTCTA
397
+ GGTACTGTcAAATCCACATCACCATCATTGCttGGTTCCAGCTaCGcTGACCATGaTAAA
398
+ AGAGTACAATGAGGGTTTTTTAATTCACCCAACAGAGCTTGCATTCCAGTACCTTTGGGC
399
+ AGCTGATATCCATTTTGTTCCTCGTATgCCTGTCAAAATCTGACATTctGagTCGCTTCG
400
+ TTTGTTCGCAACGAGCACAGTGTGCAAAGctGCTATATATTGTCC
401
+ >isotig00014 gene=isogroup00003 length=1459 numContigs=6
402
+ GGACAATATATAGCagCTTTGCACACTGTGCTCGTTGCGAACAAACGAAGCGActCagAA
403
+ TGTCAGATTTTGACAGGcATACGAGGAACAAAATGGATATCAGCTGCCCAAAGGTACTGG
404
+ AATGCAAGCTCTGTTGGGTGAATTAAAAAaCCcTCATTGTACTCTTTTATCATGGTCAGC
405
+ GTAGCTGGAACCAGCAATGATGGTGATGTGGATTTGACAGTACCTAGACAAGAATTCAAT
406
+ AGCTGCTTTGTGTTGAAGTGGGTTCAACTTGCTCAAGGCCATGACAGGATGACGACTAGA
407
+ CATtAGGATGAATTATCTGTTGCAGAGCTCTATAAaCCAGGCAGTtGTtAAAaCATCCAT
408
+ TGGGTGACCcTCACAGTCTACCAGGCACTGCATCGCAGAAGCAAAATGAAGTGTGTGTgA
409
+ TACCTTCATCTTATGAGAGTGGAATGGTATCAGTGTGGCCTCTGTCAATTTGGCTTCAAG
410
+ TCTTTGTTTTCTTGAAAGCTGAGAaGATCTTTCAGTGGTTTCACACTGACCTCTGTACTT
411
+ GACAATTCATGTGTATTTGCCAGCTCAGgAGGAAGCCAGATTTTAATTACATTAACCAAT
412
+ GCTGACTTTTTTttGGACATGTGGTACATCTGCAAGGAGGTGAAGGAAGCAGTTGGAGTT
413
+ GCATGGATGTGGAATCTTGTTGATAGTCTGGCAGTCTTATCCACAGATTATCCCAAAGCT
414
+ TCTCCACATTGCCACATTCAGAACTTATTACATTGATAAACTTGAAAATtGCAGGAATCT
415
+ AaCcAaGCACCcATCAaGGGAaTTTGTTTGTATAATtATtGAAaGCTGTGACcTTCTGAT
416
+ GTGACAGACTAATGTGAAaTAAAGGgCAgtaGaTTaCCTTGTaaTGAACCttGTTATTGT
417
+ TTGATTGTATCTAAtGTTTGCAaaTTTTAAGTTGCATGCTACCAATTGAAACATAATTCT
418
+ TTCTCTAttaatgggatataaaatacttctccttaaaagggTGTgAaGACTcggCACAAA
419
+ GAAACGTCtaTGCcGgtAaTCTGACCTAGTTTcgAatGaGGTGTAACagAAGTgTtAGAC
420
+ ACcACCAttGATCCcAGAAAATACACACACAGCTTGCTACCGTCGGTAaTTAGACACTAG
421
+ TGTACAGTCAaTACATACAGCTAcGgTCAATACCCACAaCACAgTGTAcATAGCAGCGaT
422
+ GgACATCTCAGGTCTAGATAAAAGATAaCAAGGTATCACGTTTCATtaCTGTCTGCATTT
423
+ tGTAGCgaCAagAAGAAAAcgtCATTtGCAAGCAaTGgAAAGTtAACTTTTTCaGAGCGc
424
+ agCAcGCgggTTGGGGCAAGTCTTCCAAGCCTTTAAGTtGACAtcTTGCCTTTGGCTATC
425
+ CAGGgTGACAAGATGATACTAGCAGGTAgagtgactaattgagccctgtgtgagaaacca
426
+ atgcagaatctagcctagt
427
+ >isotig00015 gene=isogroup00003 length=1138 numContigs=4
428
+ TGAATGAGAAAtGAAATTTAGCGAAGAAATCACCTTGTAAATTAAAAACTAAAATGGCTT
429
+ TCACACAAATTAaCAGTAAAtGgAGAATGTTTTTAAAGCAATATATGCAGTACAGCcATT
430
+ CATTGGAAAACAGTAAcAAAaTACATTTATCTTGTtcATTTTtACctCctGCAAaacTTA
431
+ cAaCcGTTAATTATGTAGATTGGATGGCACTAACAGGGTACTTGTCTTATCTGCCTATTG
432
+ GATAATGTGGcATTAATACTACTGTGTATGGGCACTGAGGCTGAGAGTGCAGTAAGTTtA
433
+ AAGGCATTGAAGACTCtCCCCGAaCcGCGtGCCGGGCTctGAAAAAGTtAaCTGCTCGCA
434
+ AaTtAcGTTTtCTtCTTGTCaCTaCAAAaTGCAGACATTaaTGAAACGTGATACCTTGTt
435
+ ATCTTTTATCTAGACCTGAGATGTCcAtCGCTGCTATgTACAcTGTGTTGTGGGTATTGA
436
+ CcgTAGCTGTATGTATtGACTGTACACTAGTGTCTAATtACCGACGGTAGCAAGCTGtGT
437
+ TTGTATTTTCtGGGATCGatGGCAGTGTCTAACACTTcTGTtACACCTCATtcGAAACTA
438
+ GGTCAGATTACCGGCATTAGACGTtCTTTTTGCgCGAGTCTTCACACCCTTTtAAAGctA
439
+ CTCCAtgCTGACAcACGtGgTTCCGGacTACAGAGCAATAAAAaGTAACATTCACTCCTT
440
+ GAagTtaCTCCATGCTGgCTGCCCTTAtaGATGTGGCaatGGAtaCGGACgAGAGACTTC
441
+ ACTTCTGTTGGTTGCaaaaTTCCATACACCATGGAAGCATGGAACTCACAAAACTAGtGT
442
+ TGTAgAGGGGGAGCATAGtctATGtAAATGTatGTTCTACGCCTCTGTCCCaGCTGGAAT
443
+ GGCCAGTTTATCTGCCACAATGAAGAATTGTTTGGGgTTCAATtCTGGtCcgaGAGATAG
444
+ GATGAAaGGCTGtcAATATTGTCCTTGTCTGCCCTGTGCTGCgCTCTCAATATCTGTGCC
445
+ CTcccTCGaacaCTGTTattCACTTCTTCGTGGAAACCTTTATTTGTAAGAAAAGTTCTT
446
+ AAAGACTCAGCCAttGCTAATTTATAACCTTTACTCTAGCTTAGACATACGGTCGTCT
447
+ >isotig00016 gene=isogroup00003 length=2185 numContigs=5
448
+ ATGAATGCTGGCCAGATATTTATCGCCTTGATGGCACAACTTTTCAACGCATGTCTTCTC
449
+ GTTTCTTCCAATTTCGATAGTGACATAGCTGACTCGACACTAGGAAAGAGATCTACAGGG
450
+ TTCGTGGACACGTTTGGGAAGCGTTTTGTTGACTCATTCGGTAAACGCGTGGACGAATTT
451
+ GATTATGATCACAATGGGAACTATGCCGAACAAAGTGAACAATCTTCATACATCAGTCCT
452
+ CAACTCAAACGAGGTCAAAAAGGACTGAGAAGCGGATCATTTATTGATGCTTTCGGGAAA
453
+ CGGAGTTCCTTCCAAGAAGTCGATGAGAAGAGGTTCGCGGACTCATTCGGCAAAAGATTC
454
+ GCGGACTCATTTGGGAAAAGGAGCCCGGTAGGATTTGTTGACACCTTGGGTAAAAGATTT
455
+ GCGGTCTCATTCGGTAAAAGAAATACAGTCGGATTTGTTGACACTTTGGGTAAAAGATTC
456
+ GCAGACTCGTTCGGCAAGCGGTCTCAACAAGGTTTTGTAGATGCATTCGGCAAACGATAC
457
+ CAGGGCGTTTACTAA
458
+ >isotig00017 gene=isogroup00003 length=2185 numContigs=5
459
+ ATGTGTGGCTGCATTGACGACGCAGAGTTTGCAGCAACTCATCAAGTCCAGTTTTGTGAA
460
+ ATCAATTCTGCGACATTCAATCCAAGAGAAGATCCTCTTATTGATTGTCTATATTCGGCC
461
+ AAAGACAGCGCTATTTGCTCGTGCCCTGAACTTTGCAGTGAACTCGTATACGAAGTCTCC
462
+ AAAGACTCTGTTGATTGGCCAAATATGGCAAACCTGCTCCCGTTCTTGGAGCAAATAAAT
463
+ TCATCAATGACGGGCAAACCTGCCCGAACATTTTTCGACTCGATAATTAACCACTACAGA
464
+ GCCGGTCGCCATGATGAAGCACTAGATTCAGTTCGGAGTACGTTTCTTCAACTCAATATC
465
+ TACATAGAGACAATGGAGGTTGAAGAATACACGGACAGACCCGTTTATGAT
data/lib/npsearch.rb CHANGED
@@ -54,8 +54,8 @@ module NpSearch
54
54
  end
55
55
 
56
56
  def initialise_seqs(entry)
57
- return if entry.aaseq.length > @opt[:max_seq_length]
58
- sp = Signalp.analyse_sequence(entry.aaseq)
57
+ return if entry.aaseq.length > @opt[:max_orf_length]
58
+ sp = Signalp.analyse_sequence(entry.aaseq.to_s)
59
59
  return if sp[:sp] == 'N'
60
60
  # seq = Sequence.new(entry.entry_id, entry.definition, entry.aaseq, sp)
61
61
  seq = Sequence.new(entry, sp)
@@ -1,4 +1,5 @@
1
1
  require 'bio'
2
+
2
3
  # Top level module / namespace.
3
4
  module NpSearch
4
5
  # A class that validates the command line opts
@@ -6,6 +7,7 @@ module NpSearch
6
7
  class << self
7
8
  def run(opt)
8
9
  assert_file_present('input fasta file', opt[:input_file])
10
+ opt[:input_file] = File.expand_path(opt[:input_file])
9
11
  assert_input_file_not_empty(opt[:input_file])
10
12
  assert_input_file_probably_fasta(opt[:input_file])
11
13
  opt[:type] = assert_input_sequence(opt[:input_file])
@@ -48,8 +50,9 @@ module NpSearch
48
50
  exit 1
49
51
  end
50
52
 
53
+ # determine file sequence type based on first 500 lines
51
54
  def type_of_sequences(file)
52
- fasta_content = IO.binread(file)
55
+ fasta_content = File.foreach(file).first(500).join("\n")
53
56
  # the first sequence does not need to have a fasta definition line
54
57
  sequences = fasta_content.split(/^>.*$/).delete_if(&:empty?)
55
58
  # get all sequence types
@@ -18,8 +18,8 @@ module NpSearch
18
18
  sorted_sequences.each do |s|
19
19
  if input_type == :protein
20
20
  f.puts ">#{s.defline}\n#{s.signalp}#{s.seq}"
21
- elsif input_type == :nucleotide
22
- f.puts ">#{s.defline}-(frame:#{s.translated_frame})"
21
+ elsif input_type == :genetic
22
+ f.puts ">#{s.defline}"
23
23
  f.puts "#{s.signalp}#{s.seq}"
24
24
  end
25
25
  end
@@ -1,4 +1,6 @@
1
1
  require 'forwardable'
2
+ require 'open3'
3
+ require 'timeout'
2
4
 
3
5
  # Top level module / namespace.
4
6
  module NpSearch
@@ -11,33 +13,34 @@ module NpSearch
11
13
  def analyse_sequence(seq)
12
14
  sp_headers = %w(name cmax cmax_pos ymax ymax_pos smax smax_pos smean d
13
15
  sp dmaxcut networks orf)
14
- data = setup_analysis(seq)
15
- orf_results = []
16
- s = `echo "#{data[:fasta]}\n" | #{opt[:signalp_path]} -t euk \
17
- -f short -U 0.34 -u 0.34`
18
- sp_results = s.split("\n").delete_if { |l| l[0] == '#' }
19
- sp_results.each_with_index do |line, idx|
20
- line = line + ' ' + data[:seq][idx].to_s
21
- orf_results << Hash[sp_headers.map(&:to_sym).zip(line.split)]
16
+ seqs = setup_analysis(seq)
17
+ sp_results = []
18
+ seqs.each do |seq|
19
+ sp_results << run_signalp(seq, sp_headers)
22
20
  end
23
- orf_results.sort_by { |h| h[:d] }.reverse[0]
21
+ sp_results.sort_by { |h| h[:d] }.reverse[0]
24
22
  end
25
23
 
26
- def setup_analysis(seq)
27
- if opt[:type] == :protein
28
- data = { seq: [seq], fasta: ">seq\n#{seq}" }
29
- else
30
- orfs = seq.scan(/(?=(M\w+))./).flatten
31
- orfs.unshift(seq)
32
- data = { seq: orfs, fasta: create_orf_fasta(orfs) }
24
+ private
25
+
26
+ def run_signalp(seq, sp_headers)
27
+ Timeout::timeout(300) do
28
+ cmd = "echo '>seq\n#{seq}\n' | #{opt[:signalp_path]} -t euk" \
29
+ " -f short -U 0.34 -u 0.34"
30
+ stdin, stdout, stderr, wait_thr = Open3.popen3(cmd)
31
+ out = stdout.gets(nil).split("\n").delete_if { |l| l[0] == '#' }
32
+ stdin.close; stdout.close; stderr.close
33
+ result = out[0] + ' ' + seq
34
+ return Hash[sp_headers.map(&:to_sym).zip(result.split)]
33
35
  end
34
- data
36
+ rescue Timeout::Error
37
+ no_results = [0,0,1,1,1,1,1,1,1,'N',1,1, seq]
38
+ return Hash[sp_headers.map(&:to_sym).zip(no_results)]
35
39
  end
36
40
 
37
- def create_orf_fasta(m_orf)
38
- fasta = ''
39
- m_orf.each_with_index { |seq, idx| fasta << ">#{idx}\n#{seq}\n" }
40
- fasta
41
+ def setup_analysis(seq)
42
+ orfs = seq.scan(/(?=(M\w{#{opt[:min_orf_length]},}))./).flatten
43
+ (opt[:type] == :protein || orfs.empty? || orfs.nil?) ? [seq] : orfs
41
44
  end
42
45
  end
43
46
  end
@@ -1,4 +1,4 @@
1
1
  # Top level module / namespace.
2
2
  module NpSearch
3
- VERSION = '2.1.0'.freeze
3
+ VERSION = '2.1.1'.freeze
4
4
  end
@@ -5,7 +5,7 @@ html lang="en"
5
5
  meta content="IE=edge" http-equiv="X-UA-Compatible"
6
6
  meta content="width=device-width, initial-scale=1" name="viewport"
7
7
  meta content="NpSearch | Identify Novel Neuropeptides" name="description"
8
- meta content="Wurmlab" name="author"
8
+ meta content="Moghul et al." name="author"
9
9
  title NpSearch | Identify Novel Neuropeptides
10
10
  css:
11
11
  html { position: relative; min-height: 100%; }
@@ -28,10 +28,7 @@ html lang="en"
28
28
  - @sorted_sequences.each do |seq|
29
29
  p.sequence
30
30
  span.id
31
- - if @opt[:type] == :protein
32
- | >#{seq.defline}
33
- - elsif @opt[:type] == :nucleotide
34
- | >#{seq.defline}-(frame:#{seq.translated_frame})
31
+ | >#{seq.defline}
35
32
  br
36
33
  span.seq== seq.html_seq
37
34
  br
@@ -39,13 +36,15 @@ html lang="en"
39
36
  br
40
37
  footer
41
38
  p
42
- | Please cite "Moghul I, Rowe M, Priyam A, Elphick M &amp; Wurm Y
39
+ | Please cite "Moghul
43
40
  em
44
- | (in prep)
45
- | NpSearch: A tool to identify novel neuropeptides"
41
+ | et al. (in prep)
42
+ | NpSearch: Identify Novel Neuropeptides"
46
43
  br
47
44
  | Developed at
48
- a href="https://wurmlab.github.io" target="_blank" Wurm Lab
45
+ a href="https://wurmlab.github.io" target="_blank" Wurm Lab
46
+ | &amp;
47
+ a href="http://www.sbcs.qmul.ac.uk/staff/mauriceelphick.html" target="_blank" Elphick Lab
49
48
  | ,
50
49
  a href="http://www.sbcs.qmul.ac.uk" target="_blank" QMUL
51
50
  br
metadata CHANGED
@@ -1,7 +1,7 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: npsearch
3
3
  version: !ruby/object:Gem::Version
4
- version: 2.1.0
4
+ version: 2.1.1
5
5
  platform: ruby
6
6
  authors:
7
7
  - Ismail Moghul
@@ -12,7 +12,7 @@ authors:
12
12
  autorequire:
13
13
  bindir: bin
14
14
  cert_chain: []
15
- date: 2016-10-04 00:00:00.000000000 Z
15
+ date: 2016-11-11 00:00:00.000000000 Z
16
16
  dependencies:
17
17
  - !ruby/object:Gem::Dependency
18
18
  name: bundler
@@ -117,6 +117,8 @@ files:
117
117
  - README.md
118
118
  - Rakefile
119
119
  - bin/npsearch
120
+ - exemplar_data/README.md
121
+ - exemplar_data/genetic_data.fa
120
122
  - lib/npsearch.rb
121
123
  - lib/npsearch/arg_validator.rb
122
124
  - lib/npsearch/output.rb