transrate 0.0.10 → 0.0.12
Sign up to get free protection for your applications and to get access to all the features.
- checksums.yaml +4 -4
- data/.gitignore +3 -1
- data/LICENSE +18 -1
- data/README.md +70 -47
- data/Rakefile +8 -0
- data/bin/transrate +54 -48
- data/lib/transrate.rb +4 -0
- data/lib/transrate/assembly.rb +165 -37
- data/lib/transrate/bowtie2.rb +2 -2
- data/lib/transrate/comparative_metrics.rb +7 -0
- data/lib/transrate/dimension_reduce.rb +1 -0
- data/lib/transrate/express.rb +2 -2
- data/lib/transrate/metric.rb +1 -1
- data/lib/transrate/read_metrics.rb +10 -4
- data/lib/transrate/reciprocal_annotation.rb +1 -0
- data/lib/transrate/transrater.rb +34 -9
- data/lib/transrate/usearch.rb +7 -2
- data/lib/transrate/version.rb +1 -1
- data/lib/transrate/writer.rb +18 -0
- data/test/helper.rb +16 -0
- data/transrate.gemspec +5 -5
- metadata +35 -33
- data/lib/transrate/#assembly.rb# +0 -130
checksums.yaml
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
---
|
2
2
|
SHA1:
|
3
|
-
metadata.gz:
|
4
|
-
data.tar.gz:
|
3
|
+
metadata.gz: f5f7d2d65376b69682c5e29c318ad35f43a5ea9a
|
4
|
+
data.tar.gz: 794238eafb17705f68d82296e53ffa6128bf7141
|
5
5
|
SHA512:
|
6
|
-
metadata.gz:
|
7
|
-
data.tar.gz:
|
6
|
+
metadata.gz: 101280a09d847f28165d0a4394bb849af5e339bf782a25b7e09ad45e1fbdd694f441809b09f078848c69ff0607bedc1aff91e87c50839cd0be3a997038f381a8
|
7
|
+
data.tar.gz: 1cf8a710b6e7d83139eabd4b8d820a056de19715307b822c3096458cefdec89f195d0727a8b49ccc5ac648bba9e1e8ec007092abcc94796e5a3f6b3ba4c6df99
|
data/.gitignore
CHANGED
data/LICENSE
CHANGED
@@ -1,4 +1,11 @@
|
|
1
|
-
|
1
|
+
## Summary
|
2
|
+
|
3
|
+
The Ruby code for Transrate is released under the MIT license.
|
4
|
+
|
5
|
+
SNAP and CD-HIT-2D are bundled as binaries under their respective licenses
|
6
|
+
as described below.
|
7
|
+
|
8
|
+
## The MIT License (MIT)
|
2
9
|
|
3
10
|
Copyright (c) 2013 Richard Smith
|
4
11
|
|
@@ -18,3 +25,13 @@ FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR
|
|
18
25
|
COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER
|
19
26
|
IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
|
20
27
|
CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
|
28
|
+
|
29
|
+
## SNAP
|
30
|
+
|
31
|
+
SNAP is distributed as a binary in accordance with its Apache license.
|
32
|
+
The source code for SNAP is available at https://github.com/amplab/snap
|
33
|
+
|
34
|
+
## CD-HIT-2D
|
35
|
+
|
36
|
+
CD-HIT-2D is distributed as a binary in accordance with ith GPLv2 license.
|
37
|
+
The source code for CD-HIT-2D is available at https://code.google.com/p/cdhit/
|
data/README.md
CHANGED
@@ -3,55 +3,57 @@ Transrate
|
|
3
3
|
|
4
4
|
Quality analysis and comparison of transcriptome assemblies.
|
5
5
|
|
6
|
-
##
|
7
|
-
|
8
|
-
|
9
|
-
|
10
|
-
|
11
|
-
|
12
|
-
|
13
|
-
|
14
|
-
|
15
|
-
|
16
|
-
|
17
|
-
|
18
|
-
|
19
|
-
|
20
|
-
|
21
|
-
|
22
|
-
|
23
|
-
|
6
|
+
## Contents
|
7
|
+
|
8
|
+
1. [Development status](https://github.com/Blahah/transrate#development-status)
|
9
|
+
2. [Transcriptome assembly quality metrics](https://github.com/Blahah/transrate#transcriptome-assembly-quality-metrics)
|
10
|
+
3. [Installation](https://github.com/Blahah/transrate#installation)
|
11
|
+
4. [Usage](https://github.com/Blahah/transrate#usage)
|
12
|
+
- [Command line](https://github.com/Blahah/transrate#command-line)
|
13
|
+
- [example](https://github.com/Blahah/transrate#example)
|
14
|
+
- [As a library](https://github.com/Blahah/transrate#as-a-library)
|
15
|
+
5. [Requirements](https://github.com/Blahah/transrate#requirements)
|
16
|
+
- [Ruby](https://github.com/Blahah/transrate#ruby)
|
17
|
+
- [RubyGems](https://github.com/Blahah/transrate#rubygems)
|
18
|
+
- [USEARCH, Bowtie 2, and eXpress](https://github.com/Blahah/transrate#usearch-bowtie2-and-express)
|
19
|
+
6. [Getting help](https://github.com/Blahah/transrate#getting-help)
|
20
|
+
|
21
|
+
## Development status
|
22
|
+
|
23
|
+
This software is in early development. Users should be aware that until the first release is made, features may change faster than the documentation is updated. Nevertheless, we welcome bug reports.
|
24
|
+
|
25
|
+
[![Gem Version](https://badge.fury.io/rb/transrate.png)][gem]
|
26
|
+
[![Build Status](https://secure.travis-ci.org/Blahah/transrate.png?branch=master)][travis]
|
27
|
+
[![Dependency Status](https://gemnasium.com/Blahah/transrate.png?travis)][gemnasium]
|
28
|
+
[![Code Climate](https://codeclimate.com/github/Blahah/transrate.png)][codeclimate]
|
29
|
+
[![Coverage Status](https://coveralls.io/repos/Blahah/transrate/badge.png?branch=master)][coveralls]
|
30
|
+
|
31
|
+
[gem]: https://badge.fury.io/rb/transrate
|
32
|
+
[travis]: https://travis-ci.org/Blahah/transrate
|
33
|
+
[gemnasium]: https://gemnasium.com/Blahah/transrate
|
34
|
+
[codeclimate]: https://codeclimate.com/github/Blahah/transrate
|
35
|
+
[coveralls]: https://coveralls.io/r/Blahah/transrate
|
24
36
|
|
25
|
-
|
26
|
-
* **good** - the number of read pairs mapping in a way indicative of good assembly
|
27
|
-
* **bad** - the number of reads pairs mapping in a way indicative of bad assembly
|
28
|
-
|
29
|
-
'Good' pairs are those where both members are aligned, in the correct orientation, either on the same contig or within a plausible distance of the ends of two separate contigs.
|
30
|
-
|
31
|
-
Conversely, 'bad' pairs are those where one of the conditions for being 'good' are not met.
|
32
|
-
|
33
|
-
Additionally, the software calculates whether there is any evidence in the read mappings that different contigs originate from the same transcript. These theoretical links are called bridges, and the number of bridges is shown in the **supported bridges** metric. The list of supported bridges is output to a file, `supported_bridges.csv`, in case you want to make use of the information. At a later date, transrate will include the ability to improve the assembly using this and other information.
|
34
|
-
|
35
|
-
### Comparative metrics
|
37
|
+
## Transcriptome assembly quality metrics
|
36
38
|
|
37
|
-
|
38
|
-
* **ortholog hit ratio** - the mean ratio of alignment length to reference sequence length. A low score on this metric indicates the assembly contains full-length transcripts.
|
39
|
-
* **collapse factor** - the mean number of reference proteins mapping to each contig. A high score on this metric indicates the assembly contains chimeras.
|
39
|
+
**transrate** implements a variety of established and new metrics. They are explained in detail [on the wiki](https://github.com/Blahah/transrate/wiki/Transcriptome-assembly-quality-metrics).
|
40
40
|
|
41
41
|
## Installation
|
42
42
|
|
43
|
-
|
43
|
+
Assuming all the requirements are met (see below), you can install transrate very easily. Just run at the terminal:
|
44
44
|
|
45
45
|
`gem install transrate`
|
46
46
|
|
47
|
-
If
|
47
|
+
If you're new to linux/unix, there's a detailed tutorial for installing transrate with all the dependencies [on my blog](http://blahah.net/bioinformatics/2013/10/19/installing-transrate/).
|
48
48
|
|
49
49
|
## Usage
|
50
50
|
|
51
|
+
### Command line
|
52
|
+
|
51
53
|
`transrate --help` will give you...
|
52
54
|
|
53
55
|
```
|
54
|
-
Transrate v0.0.
|
56
|
+
Transrate v0.0.10 by Richard Smith <rds45@cam.ac.uk>
|
55
57
|
|
56
58
|
DESCRIPTION:
|
57
59
|
Analyse a de-novo transcriptome
|
@@ -61,7 +63,7 @@ assembly using three kinds of metrics:
|
|
61
63
|
2. read-mapping
|
62
64
|
3. reference-based
|
63
65
|
|
64
|
-
Please make sure USEARCH and
|
66
|
+
Please make sure USEARCH, bowtie 2 and eXpress are installed
|
65
67
|
and in the PATH.
|
66
68
|
|
67
69
|
Bug reports and feature requests at:
|
@@ -84,18 +86,37 @@ OPTIONS:
|
|
84
86
|
|
85
87
|
If you don't include --left and --right read files, the read-mapping based analysis will be skipped. I recommend that you don't align all your reads - just a subset of 500,000 will give you a very good idea of the quality. You can get a subset by running (on a linux system):
|
86
88
|
|
87
|
-
`head -2000000
|
89
|
+
`head -2000000 left.fastq > left_500k.fastq`
|
90
|
+
|
91
|
+
`head -2000000 right.fastq > right_500k.fastq`
|
88
92
|
|
89
93
|
FASTQ records are 4 lines long, so make sure you multiply the number of reads you want by 4, and be sure to run the same command on both the left and right read files.
|
90
94
|
|
91
|
-
|
95
|
+
#### Example
|
92
96
|
|
93
97
|
```
|
94
98
|
transrate --assembly assembly.fasta \
|
95
|
-
|
96
|
-
|
97
|
-
|
98
|
-
|
99
|
+
--reference reference.fasta \
|
100
|
+
--left l.fq \
|
101
|
+
--right r.fq \
|
102
|
+
--threads 4
|
103
|
+
```
|
104
|
+
|
105
|
+
### As a library
|
106
|
+
|
107
|
+
```ruby
|
108
|
+
require 'transrate'
|
109
|
+
|
110
|
+
assembly = Transrate::Assembly.new(File.expand_path('assembly.fasta'))
|
111
|
+
reference = Transrate::Assembly.new(File.expand_path('reference.fasta'))
|
112
|
+
|
113
|
+
t = Transrate::Transrater.new(assembly, reference)
|
114
|
+
|
115
|
+
left = File.expand_path('left.fq')
|
116
|
+
right = File.expand_path('right.fq')
|
117
|
+
|
118
|
+
puts t.all_metrics(left, right)
|
119
|
+
puts t.assembly_score
|
99
120
|
```
|
100
121
|
|
101
122
|
## Requirements
|
@@ -116,12 +137,14 @@ Your Ruby installation *should* come with RubyGems, the package manager for Ruby
|
|
116
137
|
|
117
138
|
`gem --version`
|
118
139
|
|
119
|
-
If you don't have it installed, I recommend installing the latest version of Ruby and RubyGems using the RVM instructions above (in the Requirements:Ruby section.
|
140
|
+
If you don't have it installed, I recommend installing the latest version of Ruby and RubyGems using the RVM instructions above (in the [Requirements:Ruby](https://github.com/Blahah/transrate#ruby) section).
|
141
|
+
|
142
|
+
### Usearch, Bowtie2 and eXpress
|
120
143
|
|
121
|
-
|
144
|
+
Usearch (http://drive5.com/usearch), Bowtie2 (https://sourceforge.net/projects/bowtie-bio/files/bowtie2) and eXpress (http://bio.math.berkeley.edu/eXpress/) must be installed and in your PATH. Additionally, the Usearch binary executable should be named `usearch`.
|
122
145
|
|
123
|
-
|
146
|
+
## Getting help
|
124
147
|
|
125
|
-
|
148
|
+
If you need help using transrate, please post to the [forum here](https://groups.google.com/forum/#!forum/transrate-users).
|
126
149
|
|
127
|
-
|
150
|
+
If you think you've found a bug, please post it to the [issues list](https://github.com/Blahah/transrate/issues).
|
data/Rakefile
ADDED
data/bin/transrate
CHANGED
@@ -4,21 +4,18 @@ require 'trollop'
|
|
4
4
|
require 'transrate'
|
5
5
|
|
6
6
|
opts = Trollop::options do
|
7
|
-
version
|
7
|
+
version Transrate::VERSION::STRING.dup
|
8
8
|
banner <<-EOS
|
9
9
|
|
10
|
-
Transrate
|
10
|
+
Transrate v#{Transrate::VERSION::STRING.dup} by Richard Smith <rds45@cam.ac.uk>
|
11
11
|
|
12
12
|
DESCRIPTION:
|
13
13
|
Analyse a de-novo transcriptome
|
14
14
|
assembly using three kinds of metrics:
|
15
15
|
|
16
16
|
1. contig-based
|
17
|
-
2. read-mapping
|
18
|
-
3. reference-based
|
19
|
-
|
20
|
-
Please make sure USEARCH and bowtie2 are both installed
|
21
|
-
and in the PATH.
|
17
|
+
2. read-mapping (if --left and --right are provided)
|
18
|
+
3. reference-based (if --reference is provided)
|
22
19
|
|
23
20
|
Bug reports and feature requests at:
|
24
21
|
http://github.com/blahah/transrate
|
@@ -30,7 +27,7 @@ OPTIONS:
|
|
30
27
|
|
31
28
|
EOS
|
32
29
|
opt :assembly, "assembly file in FASTA format", :required => true, :type => String
|
33
|
-
opt :reference, "reference proteome file in FASTA format", :
|
30
|
+
opt :reference, "reference proteome file in FASTA format", :type => String
|
34
31
|
opt :left, "left reads file in FASTQ format", :type => String
|
35
32
|
opt :right, "right reads file in FASTQ format", :type => String
|
36
33
|
opt :insertsize, "mean insert size", :default => 200, :type => Integer
|
@@ -45,59 +42,68 @@ end
|
|
45
42
|
include Transrate
|
46
43
|
|
47
44
|
a = Assembly.new opts.assembly
|
48
|
-
r = Assembly.new
|
45
|
+
r = opts.reference ? Assembly.new(opts.reference) : nil
|
49
46
|
|
50
|
-
|
47
|
+
transrater = Transrater.new(a, r,
|
48
|
+
opts.left,
|
49
|
+
opts.right,
|
50
|
+
opts.insertsize,
|
51
|
+
opts.insertsd)
|
51
52
|
|
52
|
-
puts "
|
53
|
-
t0 = Time.now
|
54
|
-
contig_results = a.basic_stats
|
55
|
-
puts "...done in #{Time.now - t0} seconds"
|
53
|
+
puts "\nAnalysing assembly: #{opts.assembly}\n\n"
|
56
54
|
|
57
|
-
|
58
|
-
if (opts.left && opts.right)
|
59
|
-
puts "\ncalculating read diagnostics..."
|
60
|
-
t0 = Time.now
|
61
|
-
read_metrics = ReadMetrics.new a
|
62
|
-
read_metrics.run(opts.left, opts.right)
|
63
|
-
read_results = read_metrics.read_stats
|
64
|
-
puts "...done in #{Time.now - t0} seconds"
|
65
|
-
else
|
66
|
-
puts "\nno reads provided, skipping read diagnostics"
|
67
|
-
end
|
55
|
+
report_width = 30
|
68
56
|
|
69
|
-
puts "
|
57
|
+
puts "Calculating contig metrics..."
|
70
58
|
t0 = Time.now
|
71
|
-
|
72
|
-
comparative_metrics.run
|
73
|
-
comparative_results = comparative_metrics.comp_stats
|
74
|
-
puts "...done in #{Time.now - t0} seconds"
|
75
|
-
|
76
|
-
report_width = 30
|
59
|
+
contig_results = transrater.assembly_metrics.basic_stats
|
77
60
|
|
78
61
|
if contig_results
|
79
|
-
puts "\n
|
62
|
+
puts "\n"
|
80
63
|
puts "Contig metrics:"
|
81
64
|
puts "-" * report_width
|
82
65
|
puts pretty_print_hash(contig_results, report_width)
|
83
66
|
end
|
84
67
|
|
85
|
-
|
86
|
-
|
87
|
-
|
88
|
-
|
89
|
-
puts
|
68
|
+
puts "Contig metrics done in #{Time.now - t0} seconds"
|
69
|
+
|
70
|
+
read_results = nil
|
71
|
+
if (opts.left && opts.right)
|
72
|
+
puts "\ncalculating read diagnostics..."
|
73
|
+
t0 = Time.now
|
74
|
+
read_results = transrater.read_metrics(opts.left, opts.right).read_stats
|
75
|
+
|
76
|
+
if read_results
|
77
|
+
puts "\n"
|
78
|
+
puts "Read mapping metrics:"
|
79
|
+
puts "-" * report_width
|
80
|
+
puts pretty_print_hash(read_results, report_width)
|
81
|
+
end
|
82
|
+
|
83
|
+
puts "Read metrics done in #{Time.now - t0} seconds"
|
84
|
+
else
|
85
|
+
puts "\nNo reads provided, skipping read diagnostics"
|
90
86
|
end
|
91
87
|
|
92
|
-
if
|
93
|
-
puts "\
|
94
|
-
|
95
|
-
|
96
|
-
|
88
|
+
if opts.reference
|
89
|
+
puts "\nCalculating comparative metrics..."
|
90
|
+
t0 = Time.now
|
91
|
+
comparative_results = transrater.comparative_metrics.comp_stats
|
92
|
+
|
93
|
+
if comparative_results
|
94
|
+
puts "\n"
|
95
|
+
puts "Comparative metrics:"
|
96
|
+
puts "-" * report_width
|
97
|
+
puts pretty_print_hash(comparative_results, report_width)
|
98
|
+
end
|
99
|
+
|
100
|
+
puts "Comparative metrics done in #{Time.now - t0} seconds"
|
97
101
|
end
|
98
102
|
|
99
|
-
|
100
|
-
|
101
|
-
|
102
|
-
|
103
|
-
puts "
|
103
|
+
puts "\n"
|
104
|
+
puts "-" * report_width
|
105
|
+
score = transrater.assembly_score
|
106
|
+
unless score.nil?
|
107
|
+
puts "OVERALL SCORE: #{score.to_f.round(2) * 100}%"
|
108
|
+
puts "-" * report_width
|
109
|
+
end
|
data/lib/transrate.rb
CHANGED
data/lib/transrate/assembly.rb
CHANGED
@@ -9,12 +9,13 @@ module Transrate
|
|
9
9
|
|
10
10
|
include Enumerable
|
11
11
|
extend Forwardable
|
12
|
-
def_delegators :@assembly, :each,
|
12
|
+
def_delegators :@assembly, :each, :<<, :size, :length
|
13
13
|
|
14
14
|
attr_accessor :ublast_db
|
15
15
|
attr_accessor :orfs_ublast_db
|
16
16
|
attr_accessor :protein
|
17
17
|
attr_reader :assembly
|
18
|
+
attr_reader :has_run
|
18
19
|
|
19
20
|
# number of bases in the assembly
|
20
21
|
attr_writer :n_bases
|
@@ -25,7 +26,7 @@ module Transrate
|
|
25
26
|
# assembly n50
|
26
27
|
attr_reader :n50
|
27
28
|
|
28
|
-
#
|
29
|
+
# Return a new Assembly.
|
29
30
|
#
|
30
31
|
# - +:file+ - path to the assembly FASTA file
|
31
32
|
def initialize file
|
@@ -36,71 +37,198 @@ module Transrate
|
|
36
37
|
@n_bases += entry.length
|
37
38
|
@assembly << entry
|
38
39
|
end
|
39
|
-
@assembly.sort_by! { |x| x.length }
|
40
40
|
end
|
41
41
|
|
42
42
|
# Return a new Assembly object by loading sequences
|
43
43
|
# from the FASTA-format +:file+
|
44
|
-
def self.
|
44
|
+
def self.stats_from20_fasta file
|
45
45
|
a = Assembly.new file
|
46
46
|
a.basic_stats
|
47
47
|
end
|
48
48
|
|
49
|
-
def run
|
50
|
-
stats = self.basic_stats
|
49
|
+
def run threads=8
|
50
|
+
stats = self.basic_stats threads
|
51
51
|
stats.each_pair do |key, value|
|
52
|
-
ivar = "@#{key.gsub(
|
52
|
+
ivar = "@#{key.gsub(/\ /, '_')}".to_sym
|
53
|
+
attr_ivar = "#{key.gsub(/\ /, '_')}".to_sym
|
54
|
+
# creates accessors for the variables in stats
|
55
|
+
singleton_class.class_eval { attr_accessor attr_ivar }
|
53
56
|
self.instance_variable_set(ivar, value)
|
54
57
|
end
|
58
|
+
@has_run = true
|
55
59
|
end
|
56
60
|
|
57
|
-
# Return a hash of statistics about this assembly
|
58
|
-
|
61
|
+
# Return a hash of statistics about this assembly. Stats are
|
62
|
+
# calculated in parallel by splitting the assembly into
|
63
|
+
# equal-sized bins and calling Assembly#basic_bin_stat on each
|
64
|
+
# bin in a separate thread.
|
65
|
+
|
66
|
+
def basic_stats threads=8
|
67
|
+
|
68
|
+
# create a work queue to process contigs in parallel
|
69
|
+
queue = Queue.new
|
70
|
+
|
71
|
+
# split the contigs into equal sized bins, one bin per thread
|
72
|
+
binsize = (@assembly.size / threads.to_f).ceil
|
73
|
+
@assembly.each_slice(binsize) do |bin|
|
74
|
+
queue << bin
|
75
|
+
end
|
76
|
+
|
77
|
+
# a classic threadpool - an Array of threads that allows
|
78
|
+
# us to assign work to each thread and then aggregate their
|
79
|
+
# results when they are all finished
|
80
|
+
threadpool = []
|
81
|
+
|
82
|
+
# assign one bin of contigs to each thread from the queue.
|
83
|
+
# each thread will process its bin of contigs and then wait
|
84
|
+
# for the others to finish.
|
85
|
+
semaphore = Mutex.new
|
86
|
+
stats = []
|
87
|
+
|
88
|
+
threads.times do
|
89
|
+
threadpool << Thread.new do |thread|
|
90
|
+
# keep looping until we run out of bins
|
91
|
+
until queue.empty?
|
92
|
+
|
93
|
+
# use non-blocking pop, so an exception is raised
|
94
|
+
# when the queue runs dry
|
95
|
+
bin = queue.pop(true) rescue nil
|
96
|
+
if bin
|
97
|
+
# calculate basic stats for the bin, storing them
|
98
|
+
# in the current thread so they can be collected
|
99
|
+
# in the main thread.
|
100
|
+
bin_stats = basic_bin_stats bin
|
101
|
+
semaphore.synchronize { stats << bin_stats }
|
102
|
+
end
|
103
|
+
end
|
104
|
+
end
|
105
|
+
end
|
106
|
+
|
107
|
+
# collect the stats calculated in each thread and join
|
108
|
+
# the threads to terminate them
|
109
|
+
threadpool.each(&:join)
|
110
|
+
|
111
|
+
# merge the collected stats and return then
|
112
|
+
merge_basic_stats stats
|
113
|
+
|
114
|
+
end # basic_stats
|
115
|
+
|
116
|
+
|
117
|
+
# Calculate basic statistics in an single thread for a bin
|
118
|
+
# of contigs.
|
119
|
+
#
|
120
|
+
# Basic statistics are:
|
121
|
+
#
|
122
|
+
# - N10, N30, N50, N70, N90
|
123
|
+
# - number of contigs >= 1,000 base pairs long
|
124
|
+
# - number of contigs >= 10,000 base pairs long
|
125
|
+
# - length of the shortest contig
|
126
|
+
# - length of the longest contig
|
127
|
+
# - number of contigs in the bin
|
128
|
+
# - mean contig length
|
129
|
+
# - total number of nucleotides in the bin
|
130
|
+
# - mean % of contig length covered by the longest ORF
|
131
|
+
#
|
132
|
+
# @param [Array] bin An array of Bio::Sequence objects
|
133
|
+
# representing contigs in the assembly
|
134
|
+
|
135
|
+
def basic_bin_stats bin
|
136
|
+
|
137
|
+
# cumulative length is a float so we can divide it
|
138
|
+
# accurately later to get the mean length
|
59
139
|
cumulative_length = 0.0
|
60
|
-
|
61
|
-
x
|
62
|
-
|
63
|
-
|
64
|
-
|
140
|
+
|
141
|
+
# we'll calculate Nx for x in [10, 30, 50, 70, 90]
|
142
|
+
# to do this we create a stack of the x values and
|
143
|
+
# pop the first one to set the first cutoff. when
|
144
|
+
# the cutoff is reached we store the nucleotide length and pop
|
145
|
+
# the next value to set the next cutoff. we take a copy
|
146
|
+
# of the Array so we can use the intact original to collect
|
147
|
+
# the results later
|
148
|
+
# x = [90, 70, 50, 30, 10]
|
149
|
+
# x2 = x.clone
|
150
|
+
# cutoff = x2.pop / 100.0
|
151
|
+
# res = []
|
65
152
|
n1k = 0
|
66
153
|
n10k = 0
|
67
154
|
orf_length_sum = 0
|
68
|
-
|
69
|
-
|
70
|
-
|
71
|
-
|
72
|
-
|
73
|
-
|
74
|
-
|
75
|
-
|
76
|
-
|
77
|
-
|
78
|
-
|
79
|
-
|
80
|
-
|
81
|
-
|
155
|
+
|
156
|
+
# sort the contigs in ascending length order
|
157
|
+
# and iterate over them
|
158
|
+
bin.sort_by! { |c| c.seq.size }
|
159
|
+
bin.each do |contig|
|
160
|
+
|
161
|
+
# increment our long contig counters if this
|
162
|
+
# contig is above the thresholds
|
163
|
+
n1k += 1 if contig.length > 1_000
|
164
|
+
n10k += 1 if contig.length > 10_000
|
165
|
+
|
166
|
+
# add the length of the longest orf to the
|
167
|
+
# running total
|
168
|
+
orf_length_sum += orf_length(contig.seq)
|
169
|
+
|
170
|
+
# increment the cumulative length and check whether the Nx
|
171
|
+
# cutoff has been reached. if it has, store the Nx value and
|
172
|
+
# get the next cutoff
|
173
|
+
cumulative_length += contig.length
|
174
|
+
# if cumulative_length >= @n_bases * cutoff
|
175
|
+
# res << contig.length
|
176
|
+
# if x2.empty?
|
177
|
+
# cutoff=1
|
178
|
+
# else
|
179
|
+
# cutoff = x2.pop / 100.0
|
180
|
+
# end
|
181
|
+
# end
|
82
182
|
end
|
83
183
|
|
184
|
+
# calculate and return the statistics as a hash
|
84
185
|
mean = cumulative_length / @assembly.size
|
85
|
-
|
186
|
+
# ns = Hash[x.map { |n| "N#{n}" }.zip(res)]
|
86
187
|
{
|
87
|
-
"n_seqs" =>
|
88
|
-
"smallest" =>
|
89
|
-
"largest" =>
|
90
|
-
"n_bases" =>
|
188
|
+
"n_seqs" => bin.size,
|
189
|
+
"smallest" => bin.first.length,
|
190
|
+
"largest" => bin.last.length,
|
191
|
+
"n_bases" => n_bases,
|
91
192
|
"mean_len" => mean,
|
92
193
|
"n_1k" => n1k,
|
93
194
|
"n_10k" => n10k,
|
94
|
-
"
|
95
|
-
}
|
96
|
-
|
195
|
+
"orf_percent" => 300 * orf_length_sum / (@assembly.size * mean)
|
196
|
+
}
|
197
|
+
# }.merge ns
|
198
|
+
|
199
|
+
end # basic_bin_stats
|
200
|
+
|
201
|
+
def merge_basic_stats stats
|
202
|
+
# convert the array of hashes into a hash of arrays
|
203
|
+
collect = Hash.new{|h,k| h[k]=[]}
|
204
|
+
stats.each_with_object(collect) do |collect, result|
|
205
|
+
collect.each{ |k, v| result[k] << v }
|
206
|
+
end
|
207
|
+
merged = {}
|
208
|
+
collect.each_pair do |stat, values|
|
209
|
+
if stat == 'orf_percent' || /N[0-9]{2}/ =~ stat
|
210
|
+
# store the mean
|
211
|
+
merged[stat] = values.inject(:+) / values.size
|
212
|
+
elsif stat == 'smallest'
|
213
|
+
merged[stat] = values.min
|
214
|
+
elsif stat == 'largest'
|
215
|
+
merged[stat] = values.max
|
216
|
+
else
|
217
|
+
# store the sum
|
218
|
+
merged[stat] = values.inject(:+)
|
219
|
+
end
|
220
|
+
end
|
97
221
|
|
222
|
+
merged
|
223
|
+
|
224
|
+
end # merge_basic_stats
|
225
|
+
|
98
226
|
# finds longest orf in a sequence
|
99
227
|
def orf_length sequence
|
100
228
|
longest=0
|
101
229
|
(1..6).each do |frame|
|
102
230
|
translated = Bio::Sequence::NA.new(sequence).translate(frame)
|
103
|
-
translated.split(
|
231
|
+
translated.split('*').each do |orf|
|
104
232
|
if orf.length > longest
|
105
233
|
longest=orf.length
|
106
234
|
end
|
data/lib/transrate/bowtie2.rb
CHANGED
@@ -21,8 +21,8 @@ module Transrate
|
|
21
21
|
realistic_dist = insertsize + (3 * insertsd)
|
22
22
|
unless File.exists? outputname
|
23
23
|
# construct bowtie command
|
24
|
-
bowtiecmd = "#{@bowtie2} --very-sensitive-local -p 8 -X #{realistic_dist}" # TODO number of cores should be variable '-p 8'
|
25
|
-
bowtiecmd += " --no-unal"
|
24
|
+
bowtiecmd = "#{@bowtie2} --very-sensitive-local -k 10 -p 8 -X #{realistic_dist}" # TODO number of cores should be variable '-p 8'
|
25
|
+
bowtiecmd += " --no-unal --quiet"
|
26
26
|
bowtiecmd += " #{File.basename(file)} -1 #{left}"
|
27
27
|
# paired end?
|
28
28
|
bowtiecmd += " -2 #{right}" if right
|
@@ -5,7 +5,10 @@ module Transrate
|
|
5
5
|
class ComparativeMetrics
|
6
6
|
|
7
7
|
attr_reader :rbh_per_contig
|
8
|
+
attr_reader :rbh_per_reference
|
8
9
|
attr_reader :reciprocal_hits
|
10
|
+
attr_reader :reference_coverage
|
11
|
+
attr_reader :has_run
|
9
12
|
|
10
13
|
def initialize assembly, reference
|
11
14
|
@assembly = assembly
|
@@ -18,13 +21,17 @@ module Transrate
|
|
18
21
|
@ortholog_hit_ratio = self.ortholog_hit_ratio rbu
|
19
22
|
@collapse_factor = self.collapse_factor @ra.r2l_hits
|
20
23
|
@reciprocal_hits = rbu.size
|
24
|
+
@rbh_per_reference = @reciprocal_hits.to_f / @reference.size.to_f
|
25
|
+
@reference_coverage = @rbh_per_reference * @collapse_factor
|
21
26
|
@rbh_per_contig = @reciprocal_hits.to_f / @assembly.assembly.size.to_f
|
27
|
+
@has_run = true
|
22
28
|
end
|
23
29
|
|
24
30
|
def comp_stats
|
25
31
|
{
|
26
32
|
:reciprocal_hits => @reciprocal_hits,
|
27
33
|
:rbh_per_contig => @rbh_per_contig,
|
34
|
+
:rbh_per_reference => @rbh_per_reference,
|
28
35
|
:ortholog_hit_ratio => @ortholog_hit_ratio,
|
29
36
|
:collapse_factor => @collapse_factor
|
30
37
|
}
|
data/lib/transrate/express.rb
CHANGED
@@ -15,11 +15,11 @@ module Transrate
|
|
15
15
|
# in the assembly fastafile
|
16
16
|
def quantify_expression assembly, samfile
|
17
17
|
assembly = assembly.file if assembly.is_a? Assembly
|
18
|
-
cmd = "#{@express} --no-bias-correct #{assembly} #{samfile}"
|
18
|
+
cmd = "#{@express} --no-bias-correct #{File.expand_path assembly} #{File.expand_path samfile}"
|
19
19
|
ex_output = 'results.xprs'
|
20
20
|
fin_output = "#{assembly}_#{ex_output}"
|
21
21
|
unless File.exists? fin_output
|
22
|
-
`#{cmd}
|
22
|
+
`#{cmd} 2>&1`.split(/\n/)[1..30].join("\n")
|
23
23
|
File.rename(ex_output, fin_output)
|
24
24
|
end
|
25
25
|
expression = {}
|
data/lib/transrate/metric.rb
CHANGED
@@ -5,9 +5,10 @@ module Transrate
|
|
5
5
|
attr_reader :total
|
6
6
|
attr_reader :bad
|
7
7
|
attr_reader :supported_bridges
|
8
|
-
attr_reader :
|
8
|
+
attr_reader :pr_good_mapping
|
9
9
|
attr_reader :percent_mapping
|
10
|
-
attr_reader :
|
10
|
+
attr_reader :prop_expressed
|
11
|
+
attr_reader :has_run
|
11
12
|
|
12
13
|
def initialize assembly
|
13
14
|
@assembly = assembly
|
@@ -20,8 +21,10 @@ module Transrate
|
|
20
21
|
samfile = @mapper.map_reads(@assembly.file, left, right, insertsize, insertsd)
|
21
22
|
self.analyse_read_mappings(samfile, insertsize, insertsd)
|
22
23
|
self.analyse_expression(samfile)
|
24
|
+
@pr_good_mapping = @good.to_f / @num_pairs.to_f
|
23
25
|
@percent_mapping = @total.to_f / @num_pairs.to_f * 100.0
|
24
|
-
@pc_good_mapping = @
|
26
|
+
@pc_good_mapping = @pr_good_mapping * 100.0
|
27
|
+
@has_run = true
|
25
28
|
end
|
26
29
|
|
27
30
|
def read_stats
|
@@ -44,7 +47,8 @@ module Transrate
|
|
44
47
|
:unrealistic_fragment => @unrealistic_fragment,
|
45
48
|
:potential_bridges => @supported_bridges,
|
46
49
|
:expressed_contigs => @expressed_contigs,
|
47
|
-
:unexpressed_contigs => @unexpressed_contigs
|
50
|
+
:unexpressed_contigs => @unexpressed_contigs,
|
51
|
+
:percent_expressed => @percent_expressed
|
48
52
|
}
|
49
53
|
end
|
50
54
|
|
@@ -183,6 +187,8 @@ module Transrate
|
|
183
187
|
@expressed_contigs += 1
|
184
188
|
end
|
185
189
|
end
|
190
|
+
@prop_expressed = @expressed_contigs.to_f / @assembly.size
|
191
|
+
@percent_expressed = @prop_expressed * 100.0
|
186
192
|
end
|
187
193
|
|
188
194
|
end # ReadMetrics
|
data/lib/transrate/transrater.rb
CHANGED
@@ -6,24 +6,49 @@ module Transrate
|
|
6
6
|
attr_reader :read_metrics
|
7
7
|
attr_reader :comparative_metrics
|
8
8
|
|
9
|
-
def initialize assembly, reference, left, right, insertsize=nil, insertsd=nil
|
9
|
+
def initialize assembly, reference, left=nil, right=nil, insertsize=nil, insertsd=nil
|
10
10
|
@assembly = assembly.is_a?(Assembly) ? assembly : Assembly.new(assembly)
|
11
11
|
@reference = reference.is_a?(Assembly) ? reference : Assembly.new(reference)
|
12
12
|
@read_metrics = ReadMetrics.new @assembly
|
13
13
|
@comparative_metrics = ComparativeMetrics.new(@assembly, @reference)
|
14
14
|
end
|
15
15
|
|
16
|
-
def run left, right, insertsize=nil, insertsd=nil
|
17
|
-
|
18
|
-
|
19
|
-
|
16
|
+
def run left=nil, right=nil, insertsize=nil, insertsd=nil
|
17
|
+
assembly_metrics
|
18
|
+
if left && right
|
19
|
+
read_metrics left, right
|
20
|
+
end
|
21
|
+
comparative_metrics
|
20
22
|
end
|
21
23
|
|
22
24
|
def assembly_score
|
23
|
-
|
24
|
-
|
25
|
-
|
26
|
-
|
25
|
+
@score, pg, rc = nil
|
26
|
+
if @read_metrics.has_run
|
27
|
+
pg = Metric.new('pg', @read_metrics.pr_good_mapping, 0.0)
|
28
|
+
end
|
29
|
+
if @comparative_metrics.has_run
|
30
|
+
rc = Metric.new('rc', @comparative_metrics.reference_coverage,
|
31
|
+
0.0)
|
32
|
+
end
|
33
|
+
if (pg && rc)
|
34
|
+
@score = DimensionReduce.dimension_reduce([pg, rc])
|
35
|
+
end
|
36
|
+
return @score
|
37
|
+
end
|
38
|
+
|
39
|
+
def assembly_metrics
|
40
|
+
@assembly.run unless @assembly.has_run
|
41
|
+
@assembly
|
42
|
+
end
|
43
|
+
|
44
|
+
def read_metrics left=nil, right=nil
|
45
|
+
@read_metrics.run(left, right) unless @read_metrics.has_run
|
46
|
+
@read_metrics
|
47
|
+
end
|
48
|
+
|
49
|
+
def comparative_metrics
|
50
|
+
@comparative_metrics.run unless @comparative_metrics.has_run
|
51
|
+
@comparative_metrics
|
27
52
|
end
|
28
53
|
|
29
54
|
def all_metrics left, right, insertsize=nil, insertsd=nil
|
data/lib/transrate/usearch.rb
CHANGED
@@ -42,7 +42,9 @@ module Transrate
|
|
42
42
|
end
|
43
43
|
|
44
44
|
def findorfs filepath, output
|
45
|
-
|
45
|
+
if File.exists? output
|
46
|
+
puts "skipping ORF finding: ORF file already exists at #{output}"
|
47
|
+
else
|
46
48
|
subcmd = " -findorfs #{filepath}"
|
47
49
|
subcmd += " -output #{output}"
|
48
50
|
subcmd += " -xlat"
|
@@ -53,7 +55,10 @@ module Transrate
|
|
53
55
|
|
54
56
|
def run subcmd
|
55
57
|
subcmd += " -quiet"
|
56
|
-
`#{@cmd}#{subcmd}`
|
58
|
+
ret = `#{@cmd}#{subcmd} 2>&1`
|
59
|
+
unless $?.exitstatus == 0
|
60
|
+
puts "usearch command failed: #{subcmd}\noutput:\n#{ret}"
|
61
|
+
end
|
57
62
|
end
|
58
63
|
|
59
64
|
end # Usearch
|
data/lib/transrate/version.rb
CHANGED
@@ -0,0 +1,18 @@
|
|
1
|
+
module Transrate
|
2
|
+
|
3
|
+
class Writer
|
4
|
+
|
5
|
+
require 'csv'
|
6
|
+
|
7
|
+
def self.write name, data
|
8
|
+
CSV.open(name, 'wb') do |csv|
|
9
|
+
csv << ["metric", "value"]
|
10
|
+
data.each_pair do |k, v|
|
11
|
+
csv << [k, v]
|
12
|
+
end
|
13
|
+
end
|
14
|
+
end
|
15
|
+
|
16
|
+
end # Writer
|
17
|
+
|
18
|
+
end # Transrate
|
data/test/helper.rb
ADDED
@@ -0,0 +1,16 @@
|
|
1
|
+
require 'simplecov'
|
2
|
+
require 'coveralls'
|
3
|
+
|
4
|
+
SimpleCov.formatter = SimpleCov::Formatter::MultiFormatter[
|
5
|
+
SimpleCov::Formatter::HTMLFormatter,
|
6
|
+
Coveralls::SimpleCov::Formatter
|
7
|
+
]
|
8
|
+
SimpleCov.start
|
9
|
+
|
10
|
+
require 'test/unit'
|
11
|
+
begin; require 'turn/autorun'; rescue LoadError; end
|
12
|
+
require 'shoulda-context'
|
13
|
+
require 'transrate'
|
14
|
+
|
15
|
+
Turn.config.format = :pretty
|
16
|
+
Turn.config.trace = 5
|
data/transrate.gemspec
CHANGED
@@ -7,7 +7,7 @@ Gem::Specification.new do |gem|
|
|
7
7
|
gem.authors = [ "Richard Smith" ]
|
8
8
|
gem.email = "rds45@cam.ac.uk"
|
9
9
|
gem.licenses = ["MIT"]
|
10
|
-
gem.homepage = 'https://github.com/
|
10
|
+
gem.homepage = 'https://github.com/Blahah/transrate'
|
11
11
|
gem.summary = %q{ quality assessment of de-novo transcriptome assemblies }
|
12
12
|
gem.description = %q{ a library and command-line tool for quality assessment of de-novo transcriptome assemblies }
|
13
13
|
gem.version = Transrate::VERSION::STRING.dup
|
@@ -16,14 +16,14 @@ Gem::Specification.new do |gem|
|
|
16
16
|
gem.executables = `git ls-files -- bin/*`.split("\n").map{ |f| File.basename(f) }
|
17
17
|
gem.require_paths = %w( lib )
|
18
18
|
|
19
|
-
gem.add_dependency 'rake'
|
20
|
-
gem.add_dependency 'trollop'
|
19
|
+
gem.add_dependency 'rake'
|
20
|
+
gem.add_dependency 'trollop'
|
21
21
|
gem.add_dependency 'which'
|
22
22
|
gem.add_dependency 'bio'
|
23
|
-
gem.add_dependency 'bettersam'
|
23
|
+
gem.add_dependency 'bettersam'
|
24
24
|
|
25
25
|
gem.add_development_dependency 'turn'
|
26
26
|
gem.add_development_dependency 'simplecov'
|
27
27
|
gem.add_development_dependency 'shoulda-context'
|
28
|
-
gem.add_development_dependency 'coveralls', '
|
28
|
+
gem.add_development_dependency 'coveralls', '>= 0.6.7'
|
29
29
|
end
|
metadata
CHANGED
@@ -1,156 +1,156 @@
|
|
1
1
|
--- !ruby/object:Gem::Specification
|
2
2
|
name: transrate
|
3
3
|
version: !ruby/object:Gem::Version
|
4
|
-
version: 0.0.
|
4
|
+
version: 0.0.12
|
5
5
|
platform: ruby
|
6
6
|
authors:
|
7
7
|
- Richard Smith
|
8
8
|
autorequire:
|
9
9
|
bindir: bin
|
10
10
|
cert_chain: []
|
11
|
-
date:
|
11
|
+
date: 2014-04-14 00:00:00.000000000 Z
|
12
12
|
dependencies:
|
13
13
|
- !ruby/object:Gem::Dependency
|
14
14
|
name: rake
|
15
15
|
requirement: !ruby/object:Gem::Requirement
|
16
16
|
requirements:
|
17
|
-
- -
|
17
|
+
- - ">="
|
18
18
|
- !ruby/object:Gem::Version
|
19
|
-
version:
|
19
|
+
version: '0'
|
20
20
|
type: :runtime
|
21
21
|
prerelease: false
|
22
22
|
version_requirements: !ruby/object:Gem::Requirement
|
23
23
|
requirements:
|
24
|
-
- -
|
24
|
+
- - ">="
|
25
25
|
- !ruby/object:Gem::Version
|
26
|
-
version:
|
26
|
+
version: '0'
|
27
27
|
- !ruby/object:Gem::Dependency
|
28
28
|
name: trollop
|
29
29
|
requirement: !ruby/object:Gem::Requirement
|
30
30
|
requirements:
|
31
|
-
- -
|
31
|
+
- - ">="
|
32
32
|
- !ruby/object:Gem::Version
|
33
|
-
version: '
|
33
|
+
version: '0'
|
34
34
|
type: :runtime
|
35
35
|
prerelease: false
|
36
36
|
version_requirements: !ruby/object:Gem::Requirement
|
37
37
|
requirements:
|
38
|
-
- -
|
38
|
+
- - ">="
|
39
39
|
- !ruby/object:Gem::Version
|
40
|
-
version: '
|
40
|
+
version: '0'
|
41
41
|
- !ruby/object:Gem::Dependency
|
42
42
|
name: which
|
43
43
|
requirement: !ruby/object:Gem::Requirement
|
44
44
|
requirements:
|
45
|
-
- -
|
45
|
+
- - ">="
|
46
46
|
- !ruby/object:Gem::Version
|
47
47
|
version: '0'
|
48
48
|
type: :runtime
|
49
49
|
prerelease: false
|
50
50
|
version_requirements: !ruby/object:Gem::Requirement
|
51
51
|
requirements:
|
52
|
-
- -
|
52
|
+
- - ">="
|
53
53
|
- !ruby/object:Gem::Version
|
54
54
|
version: '0'
|
55
55
|
- !ruby/object:Gem::Dependency
|
56
56
|
name: bio
|
57
57
|
requirement: !ruby/object:Gem::Requirement
|
58
58
|
requirements:
|
59
|
-
- -
|
59
|
+
- - ">="
|
60
60
|
- !ruby/object:Gem::Version
|
61
61
|
version: '0'
|
62
62
|
type: :runtime
|
63
63
|
prerelease: false
|
64
64
|
version_requirements: !ruby/object:Gem::Requirement
|
65
65
|
requirements:
|
66
|
-
- -
|
66
|
+
- - ">="
|
67
67
|
- !ruby/object:Gem::Version
|
68
68
|
version: '0'
|
69
69
|
- !ruby/object:Gem::Dependency
|
70
70
|
name: bettersam
|
71
71
|
requirement: !ruby/object:Gem::Requirement
|
72
72
|
requirements:
|
73
|
-
- -
|
73
|
+
- - ">="
|
74
74
|
- !ruby/object:Gem::Version
|
75
|
-
version: 0
|
75
|
+
version: '0'
|
76
76
|
type: :runtime
|
77
77
|
prerelease: false
|
78
78
|
version_requirements: !ruby/object:Gem::Requirement
|
79
79
|
requirements:
|
80
|
-
- -
|
80
|
+
- - ">="
|
81
81
|
- !ruby/object:Gem::Version
|
82
|
-
version: 0
|
82
|
+
version: '0'
|
83
83
|
- !ruby/object:Gem::Dependency
|
84
84
|
name: turn
|
85
85
|
requirement: !ruby/object:Gem::Requirement
|
86
86
|
requirements:
|
87
|
-
- -
|
87
|
+
- - ">="
|
88
88
|
- !ruby/object:Gem::Version
|
89
89
|
version: '0'
|
90
90
|
type: :development
|
91
91
|
prerelease: false
|
92
92
|
version_requirements: !ruby/object:Gem::Requirement
|
93
93
|
requirements:
|
94
|
-
- -
|
94
|
+
- - ">="
|
95
95
|
- !ruby/object:Gem::Version
|
96
96
|
version: '0'
|
97
97
|
- !ruby/object:Gem::Dependency
|
98
98
|
name: simplecov
|
99
99
|
requirement: !ruby/object:Gem::Requirement
|
100
100
|
requirements:
|
101
|
-
- -
|
101
|
+
- - ">="
|
102
102
|
- !ruby/object:Gem::Version
|
103
103
|
version: '0'
|
104
104
|
type: :development
|
105
105
|
prerelease: false
|
106
106
|
version_requirements: !ruby/object:Gem::Requirement
|
107
107
|
requirements:
|
108
|
-
- -
|
108
|
+
- - ">="
|
109
109
|
- !ruby/object:Gem::Version
|
110
110
|
version: '0'
|
111
111
|
- !ruby/object:Gem::Dependency
|
112
112
|
name: shoulda-context
|
113
113
|
requirement: !ruby/object:Gem::Requirement
|
114
114
|
requirements:
|
115
|
-
- -
|
115
|
+
- - ">="
|
116
116
|
- !ruby/object:Gem::Version
|
117
117
|
version: '0'
|
118
118
|
type: :development
|
119
119
|
prerelease: false
|
120
120
|
version_requirements: !ruby/object:Gem::Requirement
|
121
121
|
requirements:
|
122
|
-
- -
|
122
|
+
- - ">="
|
123
123
|
- !ruby/object:Gem::Version
|
124
124
|
version: '0'
|
125
125
|
- !ruby/object:Gem::Dependency
|
126
126
|
name: coveralls
|
127
127
|
requirement: !ruby/object:Gem::Requirement
|
128
128
|
requirements:
|
129
|
-
- -
|
129
|
+
- - ">="
|
130
130
|
- !ruby/object:Gem::Version
|
131
131
|
version: 0.6.7
|
132
132
|
type: :development
|
133
133
|
prerelease: false
|
134
134
|
version_requirements: !ruby/object:Gem::Requirement
|
135
135
|
requirements:
|
136
|
-
- -
|
136
|
+
- - ">="
|
137
137
|
- !ruby/object:Gem::Version
|
138
138
|
version: 0.6.7
|
139
|
-
description:
|
140
|
-
assemblies
|
139
|
+
description: " a library and command-line tool for quality assessment of de-novo transcriptome
|
140
|
+
assemblies "
|
141
141
|
email: rds45@cam.ac.uk
|
142
142
|
executables:
|
143
143
|
- transrate
|
144
144
|
extensions: []
|
145
145
|
extra_rdoc_files: []
|
146
146
|
files:
|
147
|
-
- .gitignore
|
147
|
+
- ".gitignore"
|
148
148
|
- Gemfile
|
149
149
|
- LICENSE
|
150
150
|
- README.md
|
151
|
+
- Rakefile
|
151
152
|
- bin/transrate
|
152
153
|
- lib/transrate.rb
|
153
|
-
- lib/transrate/#assembly.rb#
|
154
154
|
- lib/transrate/assembly.rb
|
155
155
|
- lib/transrate/bowtie2.rb
|
156
156
|
- lib/transrate/comparative_metrics.rb
|
@@ -163,8 +163,10 @@ files:
|
|
163
163
|
- lib/transrate/transrater.rb
|
164
164
|
- lib/transrate/usearch.rb
|
165
165
|
- lib/transrate/version.rb
|
166
|
+
- lib/transrate/writer.rb
|
167
|
+
- test/helper.rb
|
166
168
|
- transrate.gemspec
|
167
|
-
homepage: https://github.com/
|
169
|
+
homepage: https://github.com/Blahah/transrate
|
168
170
|
licenses:
|
169
171
|
- MIT
|
170
172
|
metadata: {}
|
@@ -174,12 +176,12 @@ require_paths:
|
|
174
176
|
- lib
|
175
177
|
required_ruby_version: !ruby/object:Gem::Requirement
|
176
178
|
requirements:
|
177
|
-
- -
|
179
|
+
- - ">="
|
178
180
|
- !ruby/object:Gem::Version
|
179
181
|
version: '0'
|
180
182
|
required_rubygems_version: !ruby/object:Gem::Requirement
|
181
183
|
requirements:
|
182
|
-
- -
|
184
|
+
- - ">="
|
183
185
|
- !ruby/object:Gem::Version
|
184
186
|
version: '0'
|
185
187
|
requirements: []
|
data/lib/transrate/#assembly.rb#
DELETED
@@ -1,130 +0,0 @@
|
|
1
|
-
require 'bio'
|
2
|
-
require 'bettersam'
|
3
|
-
require 'csv'
|
4
|
-
require 'forwardable'
|
5
|
-
|
6
|
-
module Transrate
|
7
|
-
|
8
|
-
class Assembly
|
9
|
-
|
10
|
-
include Enumerable
|
11
|
-
extend Forwardable
|
12
|
-
def_delegators :@assembly, :each, :<<
|
13
|
-
|
14
|
-
attr_accessor :ublast_db
|
15
|
-
attr_accessor :orfs_ublast_db
|
16
|
-
attr_accessor :protein
|
17
|
-
attr_reader :assembly
|
18
|
-
|
19
|
-
# number of bases in the assembly
|
20
|
-
attr_writer :n_bases
|
21
|
-
|
22
|
-
# assembly filename
|
23
|
-
attr_accessor :file
|
24
|
-
|
25
|
-
# assembly n50
|
26
|
-
attr_reader :n50
|
27
|
-
|
28
|
-
# Reuturn a new Assembly.
|
29
|
-
#
|
30
|
-
# - +:file+ - path to the assembly FASTA file
|
31
|
-
def initialize file
|
32
|
-
@file = file
|
33
|
-
@assembly = []
|
34
|
-
@n_bases = 0
|
35
|
-
Bio::FastaFormat.open(file).each do |entry|
|
36
|
-
@n_bases += entry.length
|
37
|
-
@assembly << entry
|
38
|
-
end
|
39
|
-
@assembly.sort_by! { |x| x.length }
|
40
|
-
end
|
41
|
-
|
42
|
-
# Return a new Assembly object by loading sequences
|
43
|
-
# from the FASTA-format +:file+
|
44
|
-
def self.stats_from_fasta file
|
45
|
-
a = Assembly.new file
|
46
|
-
a.basic_stats
|
47
|
-
end
|
48
|
-
|
49
|
-
def run
|
50
|
-
stats = self.basic_stats
|
51
|
-
stats.each_pair do |key, value|
|
52
|
-
ivar = "@#{key.gsub(/ /, '_')}".to_sym
|
53
|
-
self.instance_variable_set(ivar, value)
|
54
|
-
end
|
55
|
-
end
|
56
|
-
|
57
|
-
# Return a hash of statistics about this assembly
|
58
|
-
def basic_stats
|
59
|
-
cumulative_length = 0.0
|
60
|
-
# we'll calculate Nx for all these x
|
61
|
-
x = [90, 70, 50, 30, 10]
|
62
|
-
x2 = x.clone
|
63
|
-
cutoff = x2.pop / 100.0
|
64
|
-
res = []
|
65
|
-
n1k = 0
|
66
|
-
n10k = 0
|
67
|
-
orf_length_sum = 0
|
68
|
-
@assembly.each do |s|
|
69
|
-
n1k += 1 if s.length > 1_000
|
70
|
-
n10k += 1 if s.length > 10_000
|
71
|
-
orf_length_sum += orf_length(s.seq)
|
72
|
-
|
73
|
-
cumulative_length += s.length
|
74
|
-
if cumulative_length >= @n_bases * cutoff
|
75
|
-
res << s.length
|
76
|
-
if x2.empty?
|
77
|
-
cutoff=1
|
78
|
-
else
|
79
|
-
cutoff = x2.pop / 100.0
|
80
|
-
end
|
81
|
-
end
|
82
|
-
end
|
83
|
-
|
84
|
-
mean = cumulative_length / @assembly.size
|
85
|
-
ns = Hash[x.map { |n| "N#{n}" }.zip(res)]
|
86
|
-
{
|
87
|
-
"n_seqs" => @assembly.size,
|
88
|
-
"smallest" => @assembly.first.length,
|
89
|
-
"largest" => @assembly.last.length,
|
90
|
-
"n_bases" => @n_bases,
|
91
|
-
"mean_len" => mean,
|
92
|
-
"n_1k" => n1k,
|
93
|
-
"n_10k" => n10k,
|
94
|
-
"orf percent" => 300*orf_length_sum/(@assembly.size*mean)
|
95
|
-
}.merge ns
|
96
|
-
end
|
97
|
-
|
98
|
-
# finds longest orf in a sequence
|
99
|
-
def orf_length sequence
|
100
|
-
longest=0
|
101
|
-
(1..6).each do |frame|
|
102
|
-
translated = Bio::Sequence::NA.new(sequence).translate(frame)
|
103
|
-
translated.split(/\*/).each do |orf|
|
104
|
-
if orf.length > longest
|
105
|
-
longest=orf.length
|
106
|
-
end
|
107
|
-
end
|
108
|
-
end
|
109
|
-
return longest
|
110
|
-
end
|
111
|
-
|
112
|
-
# return the number of bases in the assembly, calculating
|
113
|
-
# from the assembly if it hasn't already been done.
|
114
|
-
def n_bases
|
115
|
-
unless @n_bases
|
116
|
-
@n_bases = 0
|
117
|
-
@assembly.each { |s| @n_bases += s.length }
|
118
|
-
end
|
119
|
-
@n_bases
|
120
|
-
end
|
121
|
-
|
122
|
-
def print_stats
|
123
|
-
self.basic_stats.map do |k, v|
|
124
|
-
"#{k}#{" " * (20 - (k.length + v.to_i.to_s.length))}#{v.to_i}"
|
125
|
-
end.join("\n")
|
126
|
-
end
|
127
|
-
|
128
|
-
end # Assembly
|
129
|
-
|
130
|
-
end # Transrate
|