transrate 0.0.1 → 0.0.2
Sign up to get free protection for your applications and to get access to all the features.
- data/Gemfile.lock +2 -2
- data/README.md +90 -3
- data/lib/transrate.rb +1 -0
- data/lib/transrate/assembly.rb +12 -1
- data/lib/transrate/comparative_metrics.rb +10 -5
- data/lib/transrate/read_metrics.rb +3 -0
- data/lib/transrate/transrater.rb +19 -0
- data/lib/transrate/version.rb +1 -1
- metadata +3 -2
data/Gemfile.lock
CHANGED
@@ -2,7 +2,7 @@ PATH
|
|
2
2
|
remote: .
|
3
3
|
specs:
|
4
4
|
transrate (0.0.1)
|
5
|
-
bettersam
|
5
|
+
bettersam (~> 0.0.1.alpha)
|
6
6
|
bio
|
7
7
|
rake (~> 10.1.0)
|
8
8
|
trollop (~> 2.0)
|
@@ -13,7 +13,7 @@ GEM
|
|
13
13
|
specs:
|
14
14
|
ansi (1.4.3)
|
15
15
|
bettersam (0.0.1.alpha)
|
16
|
-
bio (1.4.3)
|
16
|
+
bio (1.4.3.0001)
|
17
17
|
colorize (0.5.8)
|
18
18
|
coveralls (0.6.7)
|
19
19
|
colorize
|
data/README.md
CHANGED
@@ -1,8 +1,42 @@
|
|
1
1
|
Transrate
|
2
2
|
----
|
3
3
|
|
4
|
-
Quality
|
4
|
+
Quality analysis and comparison of transcriptome assemblies.
|
5
5
|
|
6
|
+
## Transcriptome assembly quality metrics
|
7
|
+
|
8
|
+
**transrate** implements a variety of established and new metrics.
|
9
|
+
|
10
|
+
note: this list will be expanded soon with detailed explanations and a guide to interpreting the results.
|
11
|
+
|
12
|
+
### Contig metrics
|
13
|
+
|
14
|
+
* **n_seqs** - the number of contigs in the assembly
|
15
|
+
* **smallest** - the size of the smallest contig
|
16
|
+
* **largest** - the size of the largest contig
|
17
|
+
* **n_bases** - the number of bases included in the assembly
|
18
|
+
* **mean_len** - the mean length of the contigs
|
19
|
+
* **n > 1k** - the number of contigs greater than 1,000 bases long
|
20
|
+
* **n > 10k** - the number of contigs greater than 10,000 bases long
|
21
|
+
* **nX** - the largest contig size at which at least X% of bases are contained in contigs *longer* than this length
|
22
|
+
|
23
|
+
### Read mapping metrics
|
24
|
+
|
25
|
+
* **total** - the total number of reads pairs mapping
|
26
|
+
* **good** - the number of read pairs mapping in a way indicative of good assembly
|
27
|
+
* **bad** - the number of reads pairs mapping in a way indicative of bad assembly
|
28
|
+
|
29
|
+
'Good' pairs are those where both members are aligned, in the correct orientation, either on the same contig or within a plausible distance of the ends of two separate contigs.
|
30
|
+
|
31
|
+
Conversely, 'bad' pairs are those where one of the conditions for being 'good' are not met.
|
32
|
+
|
33
|
+
Additionally, the software calculates whether there is any evidence in the read mappings that different contigs originate from the same transcript. These theoretical links are called bridges, and the number of bridges is shown in the **supported bridges** metric. The list of supported bridges is output to a file, `supported_bridges.csv`, in case you want to make use of the information. At a later date, transrate will include the ability to improve the assembly using this and other information.
|
34
|
+
|
35
|
+
### Comparative metrics
|
36
|
+
|
37
|
+
* **reciprocal hits** - the number of reciprocal best hits against the reference using ublast. A high score indicates that a large number of real transcripts have been assembled.
|
38
|
+
* **ortholog hit ratio** - the mean ratio of alignment length to reference sequence length. A low score on this metric indicates the assembly contains full-length transcripts.
|
39
|
+
* **collapse factor** - the mean number of reference proteins mapping to each contig. A high score on this metric indicates the assembly contains chimeras.
|
6
40
|
|
7
41
|
## Installation
|
8
42
|
|
@@ -14,6 +48,55 @@ If that doesn't work, check the requirements below...
|
|
14
48
|
|
15
49
|
## Usage
|
16
50
|
|
51
|
+
`transrate --help` will give you...
|
52
|
+
|
53
|
+
```
|
54
|
+
Transrate v0.0.1a by Richard Smith <rds45@cam.ac.uk>
|
55
|
+
|
56
|
+
DESCRIPTION:
|
57
|
+
Analyse a de-novo transcriptome
|
58
|
+
assembly using three kinds of metrics:
|
59
|
+
|
60
|
+
1. contig-based
|
61
|
+
2. read-mapping
|
62
|
+
3. reference-based
|
63
|
+
|
64
|
+
Please make sure USEARCH and bowtie2 are both installed
|
65
|
+
and in the PATH.
|
66
|
+
|
67
|
+
Bug reports and feature requests at:
|
68
|
+
http://github.com/blahah/transrate
|
69
|
+
|
70
|
+
USAGE:
|
71
|
+
transrate <options>
|
72
|
+
|
73
|
+
OPTIONS:
|
74
|
+
--assembly, -a <s>: assembly file in FASTA format
|
75
|
+
--reference, -r <s>: reference proteome file in FASTA format
|
76
|
+
--left, -l <s>: left reads file in FASTQ format
|
77
|
+
--right, -i <s>: right reads file in FASTQ format
|
78
|
+
--insertsize, -n <i>: mean insert size (default: 200)
|
79
|
+
--insertsd, -s <i>: insert size standard deviation (default: 50)
|
80
|
+
--threads, -t <i>: number of threads to use (default: 8)
|
81
|
+
--version, -v: Print version and exit
|
82
|
+
--help, -h: Show this message
|
83
|
+
```
|
84
|
+
|
85
|
+
If you don't include --left and --right read files, the read-mapping based analysis will be skipped. I recommend that you don't align all your reads - just a subset of 500,000 will give you a very good idea of the quality. You can get a subset by running (on a linux system):
|
86
|
+
|
87
|
+
`head -2000000 readfile.fastq`
|
88
|
+
|
89
|
+
FASTQ records are 4 lines long, so make sure you multiply the number of reads you want by 4, and be sure to run the same command on both the left and right read files.
|
90
|
+
|
91
|
+
### Example
|
92
|
+
|
93
|
+
```
|
94
|
+
transrate --assembly assembly.fasta \
|
95
|
+
--reference reference.fasta \
|
96
|
+
--left l.fq \
|
97
|
+
--right r.fq \
|
98
|
+
--threads 4
|
99
|
+
```
|
17
100
|
|
18
101
|
## Requirements
|
19
102
|
|
@@ -23,7 +106,7 @@ First, you'll need Ruby v1.9.3 or greater installed. You can check with:
|
|
23
106
|
|
24
107
|
`ruby --version`
|
25
108
|
|
26
|
-
If you don't have Ruby installed, or you need a higher version, I recommend using [RVM](http://rvm.io/) as your Ruby Version Manager. To install RVM along with the latest
|
109
|
+
If you don't have Ruby installed, or you need a higher version, I recommend using [RVM](http://rvm.io/) as your Ruby Version Manager. To install RVM along with the latest Ruby, just run:
|
27
110
|
|
28
111
|
`\curl -L https://get.rvm.io | bash -s stable`
|
29
112
|
|
@@ -35,6 +118,10 @@ Your Ruby installation *should* come with RubyGems, the package manager for Ruby
|
|
35
118
|
|
36
119
|
If you don't have it installed, I recommend installing the latest version of Ruby and RubyGems using the RVM instructions above (in the Requirements:Ruby section.
|
37
120
|
|
121
|
+
### Usearch and Bowtie2
|
122
|
+
|
123
|
+
Usearch (http://drive5.com/usearch) and Bowtie2 (https://sourceforge.net/projects/bowtie-bio/files/bowtie2) must be installed and in your PATH. Additionally, the Usearch binary executable should be named `usearch`.
|
124
|
+
|
38
125
|
## Development status
|
39
126
|
|
40
|
-
This software is in very early development. Nevertheless, we welcome bug reports.
|
127
|
+
This software is in very early development. Nevertheless, we welcome bug reports.
|
data/lib/transrate.rb
CHANGED
data/lib/transrate/assembly.rb
CHANGED
@@ -19,9 +19,12 @@ class Assembly
|
|
19
19
|
# assembly filename
|
20
20
|
attr_accessor :file
|
21
21
|
|
22
|
+
# assembly n50
|
23
|
+
attr_reader :n50
|
24
|
+
|
22
25
|
# Reuturn a new Assembly.
|
23
26
|
#
|
24
|
-
# - +:
|
27
|
+
# - +:file+ - path to the assembly FASTA file
|
25
28
|
def initialize file
|
26
29
|
@file = file
|
27
30
|
@assembly = []
|
@@ -40,6 +43,14 @@ class Assembly
|
|
40
43
|
a.basic_stats
|
41
44
|
end
|
42
45
|
|
46
|
+
def run
|
47
|
+
stats = self.basic_stats
|
48
|
+
stats.each_pair do |key, value|
|
49
|
+
ivar = "@#{key.gsub(/ /, '_')}".to_sym
|
50
|
+
self.instance_variable_set(key, value)
|
51
|
+
end
|
52
|
+
end
|
53
|
+
|
43
54
|
# Return a hash of statistics about this assembly
|
44
55
|
def basic_stats
|
45
56
|
cumulative_length = 0.0
|
@@ -4,6 +4,8 @@ module Transrate
|
|
4
4
|
|
5
5
|
class ComparativeMetrics
|
6
6
|
|
7
|
+
attr_accessor :reciprocal_hits
|
8
|
+
|
7
9
|
def initialize assembly, reference
|
8
10
|
@assembly = assembly
|
9
11
|
@reference = reference
|
@@ -12,12 +14,13 @@ module Transrate
|
|
12
14
|
|
13
15
|
def run
|
14
16
|
rbu = self.reciprocal_best_ublast
|
15
|
-
|
16
|
-
|
17
|
+
@ortholog_hit_ratio = self.ortholog_hit_ratio rbu
|
18
|
+
@collapse_factor = self.collapse_factor @ra.l2r_hits
|
19
|
+
@reciprocal_hits = rbu.size
|
17
20
|
{
|
18
|
-
:reciprocal_hits =>
|
19
|
-
:ortholog_hit_ratio =>
|
20
|
-
:collapse_factor =>
|
21
|
+
:reciprocal_hits => @reciprocal_hits,
|
22
|
+
:ortholog_hit_ratio => @ortholog_hit_ratio,
|
23
|
+
:collapse_factor => @collapse_factor
|
21
24
|
}
|
22
25
|
end
|
23
26
|
|
@@ -27,10 +30,12 @@ module Transrate
|
|
27
30
|
end
|
28
31
|
|
29
32
|
def ortholog_hit_ratio rbu
|
33
|
+
return @ortholog_hit_ratio unless @ortholog_hit_ratio.nil?
|
30
34
|
rbu.reduce(0.0){ |sum, hit| sum += hit.last.tcov.to_f } / rbu.size
|
31
35
|
end
|
32
36
|
|
33
37
|
def collapse_factor hits
|
38
|
+
return @collapse_factor unless @collapse_factor.nil?
|
34
39
|
targets = {}
|
35
40
|
hits.each_pair do |query, hit|
|
36
41
|
unless targets.has_key? query
|
@@ -0,0 +1,19 @@
|
|
1
|
+
module Transrate
|
2
|
+
|
3
|
+
class Transrater
|
4
|
+
|
5
|
+
def initialize path
|
6
|
+
@assembly = Assembly.new path
|
7
|
+
@read_metrics = ReadMetrics.new @assembly
|
8
|
+
@comparative_metrics = ComparativeMetrics.new @assembly
|
9
|
+
end
|
10
|
+
|
11
|
+
def run
|
12
|
+
@assembly.run
|
13
|
+
@read_metrics.run
|
14
|
+
@comparative_metrics.run
|
15
|
+
end
|
16
|
+
|
17
|
+
end # Transrater
|
18
|
+
|
19
|
+
end # Transrate
|
data/lib/transrate/version.rb
CHANGED
metadata
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
--- !ruby/object:Gem::Specification
|
2
2
|
name: transrate
|
3
3
|
version: !ruby/object:Gem::Version
|
4
|
-
version: 0.0.
|
4
|
+
version: 0.0.2
|
5
5
|
prerelease:
|
6
6
|
platform: ruby
|
7
7
|
authors:
|
@@ -9,7 +9,7 @@ authors:
|
|
9
9
|
autorequire:
|
10
10
|
bindir: bin
|
11
11
|
cert_chain: []
|
12
|
-
date: 2013-09-
|
12
|
+
date: 2013-09-19 00:00:00.000000000 Z
|
13
13
|
dependencies:
|
14
14
|
- !ruby/object:Gem::Dependency
|
15
15
|
name: rake
|
@@ -176,6 +176,7 @@ files:
|
|
176
176
|
- lib/transrate/rb_hit.rb
|
177
177
|
- lib/transrate/read_metrics.rb
|
178
178
|
- lib/transrate/reciprocal_annotation.rb
|
179
|
+
- lib/transrate/transrater.rb
|
179
180
|
- lib/transrate/usearch.rb
|
180
181
|
- lib/transrate/version.rb
|
181
182
|
- transrate.gemspec
|