transrate 0.0.10 → 0.0.12

Sign up to get free protection for your applications and to get access to all the features.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA1:
3
- metadata.gz: e4a7687e1bc2071fe2f043245e1eff90742f4f1e
4
- data.tar.gz: 61ef6386c15fe8d56485c8c5737d3283aefaede3
3
+ metadata.gz: f5f7d2d65376b69682c5e29c318ad35f43a5ea9a
4
+ data.tar.gz: 794238eafb17705f68d82296e53ffa6128bf7141
5
5
  SHA512:
6
- metadata.gz: 42ab6bc0454bd683798c5e9a1d93a7687fd3282bac182275901b89435dc3c90203dc3ffab12ad128fdf4ebab99702df816f3534917a061da6efdbd120e85cf9b
7
- data.tar.gz: fed477db5ad8a33560bdf25a9bff9185638e49e418ccd5fcf9808235034abbc03ea1c7dded07520f4e4d0bff3f784146417a7e11c58cf938a4007bc71e7c15fc
6
+ metadata.gz: 101280a09d847f28165d0a4394bb849af5e339bf782a25b7e09ad45e1fbdd694f441809b09f078848c69ff0607bedc1aff91e87c50839cd0be3a997038f381a8
7
+ data.tar.gz: 1cf8a710b6e7d83139eabd4b8d820a056de19715307b822c3096458cefdec89f195d0727a8b49ccc5ac648bba9e1e8ec007092abcc94796e5a3f6b3ba4c6df99
data/.gitignore CHANGED
@@ -9,7 +9,6 @@ lib/bundler/man
9
9
  pkg
10
10
  rdoc
11
11
  spec/reports
12
- test
13
12
  test/tmp
14
13
  test/version_tmp
15
14
  tmp
@@ -19,3 +18,6 @@ tmp
19
18
  _yardoc
20
19
  doc/
21
20
  .ruby-version
21
+
22
+ # large test files not for repo
23
+ dryrun
data/LICENSE CHANGED
@@ -1,4 +1,11 @@
1
- The MIT License (MIT)
1
+ ## Summary
2
+
3
+ The Ruby code for Transrate is released under the MIT license.
4
+
5
+ SNAP and CD-HIT-2D are bundled as binaries under their respective licenses
6
+ as described below.
7
+
8
+ ## The MIT License (MIT)
2
9
 
3
10
  Copyright (c) 2013 Richard Smith
4
11
 
@@ -18,3 +25,13 @@ FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR
18
25
  COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER
19
26
  IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
20
27
  CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
28
+
29
+ ## SNAP
30
+
31
+ SNAP is distributed as a binary in accordance with its Apache license.
32
+ The source code for SNAP is available at https://github.com/amplab/snap
33
+
34
+ ## CD-HIT-2D
35
+
36
+ CD-HIT-2D is distributed as a binary in accordance with ith GPLv2 license.
37
+ The source code for CD-HIT-2D is available at https://code.google.com/p/cdhit/
data/README.md CHANGED
@@ -3,55 +3,57 @@ Transrate
3
3
 
4
4
  Quality analysis and comparison of transcriptome assemblies.
5
5
 
6
- ## Transcriptome assembly quality metrics
7
-
8
- **transrate** implements a variety of established and new metrics.
9
-
10
- note: this list will be expanded soon with detailed explanations and a guide to interpreting the results.
11
-
12
- ### Contig metrics
13
-
14
- * **n_seqs** - the number of contigs in the assembly
15
- * **smallest** - the size of the smallest contig
16
- * **largest** - the size of the largest contig
17
- * **n_bases** - the number of bases included in the assembly
18
- * **mean_len** - the mean length of the contigs
19
- * **n > 1k** - the number of contigs greater than 1,000 bases long
20
- * **n > 10k** - the number of contigs greater than 10,000 bases long
21
- * **nX** - the largest contig size at which at least X% of bases are contained in contigs *longer* than this length
22
-
23
- ### Read mapping metrics
6
+ ## Contents
7
+
8
+ 1. [Development status](https://github.com/Blahah/transrate#development-status)
9
+ 2. [Transcriptome assembly quality metrics](https://github.com/Blahah/transrate#transcriptome-assembly-quality-metrics)
10
+ 3. [Installation](https://github.com/Blahah/transrate#installation)
11
+ 4. [Usage](https://github.com/Blahah/transrate#usage)
12
+ - [Command line](https://github.com/Blahah/transrate#command-line)
13
+ - [example](https://github.com/Blahah/transrate#example)
14
+ - [As a library](https://github.com/Blahah/transrate#as-a-library)
15
+ 5. [Requirements](https://github.com/Blahah/transrate#requirements)
16
+ - [Ruby](https://github.com/Blahah/transrate#ruby)
17
+ - [RubyGems](https://github.com/Blahah/transrate#rubygems)
18
+ - [USEARCH, Bowtie 2, and eXpress](https://github.com/Blahah/transrate#usearch-bowtie2-and-express)
19
+ 6. [Getting help](https://github.com/Blahah/transrate#getting-help)
20
+
21
+ ## Development status
22
+
23
+ This software is in early development. Users should be aware that until the first release is made, features may change faster than the documentation is updated. Nevertheless, we welcome bug reports.
24
+
25
+ [![Gem Version](https://badge.fury.io/rb/transrate.png)][gem]
26
+ [![Build Status](https://secure.travis-ci.org/Blahah/transrate.png?branch=master)][travis]
27
+ [![Dependency Status](https://gemnasium.com/Blahah/transrate.png?travis)][gemnasium]
28
+ [![Code Climate](https://codeclimate.com/github/Blahah/transrate.png)][codeclimate]
29
+ [![Coverage Status](https://coveralls.io/repos/Blahah/transrate/badge.png?branch=master)][coveralls]
30
+
31
+ [gem]: https://badge.fury.io/rb/transrate
32
+ [travis]: https://travis-ci.org/Blahah/transrate
33
+ [gemnasium]: https://gemnasium.com/Blahah/transrate
34
+ [codeclimate]: https://codeclimate.com/github/Blahah/transrate
35
+ [coveralls]: https://coveralls.io/r/Blahah/transrate
24
36
 
25
- * **total** - the total number of reads pairs mapping
26
- * **good** - the number of read pairs mapping in a way indicative of good assembly
27
- * **bad** - the number of reads pairs mapping in a way indicative of bad assembly
28
-
29
- 'Good' pairs are those where both members are aligned, in the correct orientation, either on the same contig or within a plausible distance of the ends of two separate contigs.
30
-
31
- Conversely, 'bad' pairs are those where one of the conditions for being 'good' are not met.
32
-
33
- Additionally, the software calculates whether there is any evidence in the read mappings that different contigs originate from the same transcript. These theoretical links are called bridges, and the number of bridges is shown in the **supported bridges** metric. The list of supported bridges is output to a file, `supported_bridges.csv`, in case you want to make use of the information. At a later date, transrate will include the ability to improve the assembly using this and other information.
34
-
35
- ### Comparative metrics
37
+ ## Transcriptome assembly quality metrics
36
38
 
37
- * **reciprocal hits** - the number of reciprocal best hits against the reference using ublast. A high score indicates that a large number of real transcripts have been assembled.
38
- * **ortholog hit ratio** - the mean ratio of alignment length to reference sequence length. A low score on this metric indicates the assembly contains full-length transcripts.
39
- * **collapse factor** - the mean number of reference proteins mapping to each contig. A high score on this metric indicates the assembly contains chimeras.
39
+ **transrate** implements a variety of established and new metrics. They are explained in detail [on the wiki](https://github.com/Blahah/transrate/wiki/Transcriptome-assembly-quality-metrics).
40
40
 
41
41
  ## Installation
42
42
 
43
- You can install transrate very easily. Just run at the terminal:
43
+ Assuming all the requirements are met (see below), you can install transrate very easily. Just run at the terminal:
44
44
 
45
45
  `gem install transrate`
46
46
 
47
- If that doesn't work, check the requirements below...
47
+ If you're new to linux/unix, there's a detailed tutorial for installing transrate with all the dependencies [on my blog](http://blahah.net/bioinformatics/2013/10/19/installing-transrate/).
48
48
 
49
49
  ## Usage
50
50
 
51
+ ### Command line
52
+
51
53
  `transrate --help` will give you...
52
54
 
53
55
  ```
54
- Transrate v0.0.1a by Richard Smith <rds45@cam.ac.uk>
56
+ Transrate v0.0.10 by Richard Smith <rds45@cam.ac.uk>
55
57
 
56
58
  DESCRIPTION:
57
59
  Analyse a de-novo transcriptome
@@ -61,7 +63,7 @@ assembly using three kinds of metrics:
61
63
  2. read-mapping
62
64
  3. reference-based
63
65
 
64
- Please make sure USEARCH and bowtie2 are both installed
66
+ Please make sure USEARCH, bowtie 2 and eXpress are installed
65
67
  and in the PATH.
66
68
 
67
69
  Bug reports and feature requests at:
@@ -84,18 +86,37 @@ OPTIONS:
84
86
 
85
87
  If you don't include --left and --right read files, the read-mapping based analysis will be skipped. I recommend that you don't align all your reads - just a subset of 500,000 will give you a very good idea of the quality. You can get a subset by running (on a linux system):
86
88
 
87
- `head -2000000 readfile.fastq`
89
+ `head -2000000 left.fastq > left_500k.fastq`
90
+
91
+ `head -2000000 right.fastq > right_500k.fastq`
88
92
 
89
93
  FASTQ records are 4 lines long, so make sure you multiply the number of reads you want by 4, and be sure to run the same command on both the left and right read files.
90
94
 
91
- ### Example
95
+ #### Example
92
96
 
93
97
  ```
94
98
  transrate --assembly assembly.fasta \
95
- --reference reference.fasta \
96
- --left l.fq \
97
- --right r.fq \
98
- --threads 4
99
+ --reference reference.fasta \
100
+ --left l.fq \
101
+ --right r.fq \
102
+ --threads 4
103
+ ```
104
+
105
+ ### As a library
106
+
107
+ ```ruby
108
+ require 'transrate'
109
+
110
+ assembly = Transrate::Assembly.new(File.expand_path('assembly.fasta'))
111
+ reference = Transrate::Assembly.new(File.expand_path('reference.fasta'))
112
+
113
+ t = Transrate::Transrater.new(assembly, reference)
114
+
115
+ left = File.expand_path('left.fq')
116
+ right = File.expand_path('right.fq')
117
+
118
+ puts t.all_metrics(left, right)
119
+ puts t.assembly_score
99
120
  ```
100
121
 
101
122
  ## Requirements
@@ -116,12 +137,14 @@ Your Ruby installation *should* come with RubyGems, the package manager for Ruby
116
137
 
117
138
  `gem --version`
118
139
 
119
- If you don't have it installed, I recommend installing the latest version of Ruby and RubyGems using the RVM instructions above (in the Requirements:Ruby section.
140
+ If you don't have it installed, I recommend installing the latest version of Ruby and RubyGems using the RVM instructions above (in the [Requirements:Ruby](https://github.com/Blahah/transrate#ruby) section).
141
+
142
+ ### Usearch, Bowtie2 and eXpress
120
143
 
121
- ### Usearch and Bowtie2
144
+ Usearch (http://drive5.com/usearch), Bowtie2 (https://sourceforge.net/projects/bowtie-bio/files/bowtie2) and eXpress (http://bio.math.berkeley.edu/eXpress/) must be installed and in your PATH. Additionally, the Usearch binary executable should be named `usearch`.
122
145
 
123
- Usearch (http://drive5.com/usearch) and Bowtie2 (https://sourceforge.net/projects/bowtie-bio/files/bowtie2) must be installed and in your PATH. Additionally, the Usearch binary executable should be named `usearch`.
146
+ ## Getting help
124
147
 
125
- ## Development status
148
+ If you need help using transrate, please post to the [forum here](https://groups.google.com/forum/#!forum/transrate-users).
126
149
 
127
- This software is in very early development. Nevertheless, we welcome bug reports.
150
+ If you think you've found a bug, please post it to the [issues list](https://github.com/Blahah/transrate/issues).
data/Rakefile ADDED
@@ -0,0 +1,8 @@
1
+ require 'rake/testtask'
2
+
3
+ Rake::TestTask.new do |t|
4
+ t.libs << 'test'
5
+ end
6
+
7
+ desc "Run tests"
8
+ task :default => :test
data/bin/transrate CHANGED
@@ -4,21 +4,18 @@ require 'trollop'
4
4
  require 'transrate'
5
5
 
6
6
  opts = Trollop::options do
7
- version "v0.0.1a"
7
+ version Transrate::VERSION::STRING.dup
8
8
  banner <<-EOS
9
9
 
10
- Transrate v0.0.1a by Richard Smith <rds45@cam.ac.uk>
10
+ Transrate v#{Transrate::VERSION::STRING.dup} by Richard Smith <rds45@cam.ac.uk>
11
11
 
12
12
  DESCRIPTION:
13
13
  Analyse a de-novo transcriptome
14
14
  assembly using three kinds of metrics:
15
15
 
16
16
  1. contig-based
17
- 2. read-mapping
18
- 3. reference-based
19
-
20
- Please make sure USEARCH and bowtie2 are both installed
21
- and in the PATH.
17
+ 2. read-mapping (if --left and --right are provided)
18
+ 3. reference-based (if --reference is provided)
22
19
 
23
20
  Bug reports and feature requests at:
24
21
  http://github.com/blahah/transrate
@@ -30,7 +27,7 @@ OPTIONS:
30
27
 
31
28
  EOS
32
29
  opt :assembly, "assembly file in FASTA format", :required => true, :type => String
33
- opt :reference, "reference proteome file in FASTA format", :required => true, :type => String
30
+ opt :reference, "reference proteome file in FASTA format", :type => String
34
31
  opt :left, "left reads file in FASTQ format", :type => String
35
32
  opt :right, "right reads file in FASTQ format", :type => String
36
33
  opt :insertsize, "mean insert size", :default => 200, :type => Integer
@@ -45,59 +42,68 @@ end
45
42
  include Transrate
46
43
 
47
44
  a = Assembly.new opts.assembly
48
- r = Assembly.new opts.reference
45
+ r = opts.reference ? Assembly.new(opts.reference) : nil
49
46
 
50
- puts "\n\nAnalysing assembly: #{opts.assembly}\n\n"
47
+ transrater = Transrater.new(a, r,
48
+ opts.left,
49
+ opts.right,
50
+ opts.insertsize,
51
+ opts.insertsd)
51
52
 
52
- puts "calculating contig stats..."
53
- t0 = Time.now
54
- contig_results = a.basic_stats
55
- puts "...done in #{Time.now - t0} seconds"
53
+ puts "\nAnalysing assembly: #{opts.assembly}\n\n"
56
54
 
57
- read_results = nil
58
- if (opts.left && opts.right)
59
- puts "\ncalculating read diagnostics..."
60
- t0 = Time.now
61
- read_metrics = ReadMetrics.new a
62
- read_metrics.run(opts.left, opts.right)
63
- read_results = read_metrics.read_stats
64
- puts "...done in #{Time.now - t0} seconds"
65
- else
66
- puts "\nno reads provided, skipping read diagnostics"
67
- end
55
+ report_width = 30
68
56
 
69
- puts "\ncalculating comparative metrics..."
57
+ puts "Calculating contig metrics..."
70
58
  t0 = Time.now
71
- comparative_metrics = ComparativeMetrics.new(a, r)
72
- comparative_metrics.run
73
- comparative_results = comparative_metrics.comp_stats
74
- puts "...done in #{Time.now - t0} seconds"
75
-
76
- report_width = 30
59
+ contig_results = transrater.assembly_metrics.basic_stats
77
60
 
78
61
  if contig_results
79
- puts "\n\n"
62
+ puts "\n"
80
63
  puts "Contig metrics:"
81
64
  puts "-" * report_width
82
65
  puts pretty_print_hash(contig_results, report_width)
83
66
  end
84
67
 
85
- if read_results
86
- puts "\n\n"
87
- puts "Read mapping metrics:"
88
- puts "-" * report_width
89
- puts pretty_print_hash(read_results, report_width)
68
+ puts "Contig metrics done in #{Time.now - t0} seconds"
69
+
70
+ read_results = nil
71
+ if (opts.left && opts.right)
72
+ puts "\ncalculating read diagnostics..."
73
+ t0 = Time.now
74
+ read_results = transrater.read_metrics(opts.left, opts.right).read_stats
75
+
76
+ if read_results
77
+ puts "\n"
78
+ puts "Read mapping metrics:"
79
+ puts "-" * report_width
80
+ puts pretty_print_hash(read_results, report_width)
81
+ end
82
+
83
+ puts "Read metrics done in #{Time.now - t0} seconds"
84
+ else
85
+ puts "\nNo reads provided, skipping read diagnostics"
90
86
  end
91
87
 
92
- if comparative_results
93
- puts "\n\n"
94
- puts "Comparative metrics:"
95
- puts "-" * report_width
96
- puts pretty_print_hash(comparative_results, report_width)
88
+ if opts.reference
89
+ puts "\nCalculating comparative metrics..."
90
+ t0 = Time.now
91
+ comparative_results = transrater.comparative_metrics.comp_stats
92
+
93
+ if comparative_results
94
+ puts "\n"
95
+ puts "Comparative metrics:"
96
+ puts "-" * report_width
97
+ puts pretty_print_hash(comparative_results, report_width)
98
+ end
99
+
100
+ puts "Comparative metrics done in #{Time.now - t0} seconds"
97
101
  end
98
102
 
99
- transrater = Transrater.new(a, r, opts.left, opts.right)
100
- transrater.run(opts.left, opts.right)
101
- puts "\n\n"
102
- puts "Overall score #{transrater.assembly_score.to_f.round(2)}"
103
- puts "\n" + "-" * report_width
103
+ puts "\n"
104
+ puts "-" * report_width
105
+ score = transrater.assembly_score
106
+ unless score.nil?
107
+ puts "OVERALL SCORE: #{score.to_f.round(2) * 100}%"
108
+ puts "-" * report_width
109
+ end
data/lib/transrate.rb CHANGED
@@ -10,3 +10,7 @@ require 'transrate/comparative_metrics'
10
10
  require 'transrate/metric'
11
11
  require 'transrate/dimension_reduce'
12
12
  require 'transrate/express'
13
+
14
+ module Transrate
15
+
16
+ end # Transrate
@@ -9,12 +9,13 @@ module Transrate
9
9
 
10
10
  include Enumerable
11
11
  extend Forwardable
12
- def_delegators :@assembly, :each, :<<
12
+ def_delegators :@assembly, :each, :<<, :size, :length
13
13
 
14
14
  attr_accessor :ublast_db
15
15
  attr_accessor :orfs_ublast_db
16
16
  attr_accessor :protein
17
17
  attr_reader :assembly
18
+ attr_reader :has_run
18
19
 
19
20
  # number of bases in the assembly
20
21
  attr_writer :n_bases
@@ -25,7 +26,7 @@ module Transrate
25
26
  # assembly n50
26
27
  attr_reader :n50
27
28
 
28
- # Reuturn a new Assembly.
29
+ # Return a new Assembly.
29
30
  #
30
31
  # - +:file+ - path to the assembly FASTA file
31
32
  def initialize file
@@ -36,71 +37,198 @@ module Transrate
36
37
  @n_bases += entry.length
37
38
  @assembly << entry
38
39
  end
39
- @assembly.sort_by! { |x| x.length }
40
40
  end
41
41
 
42
42
  # Return a new Assembly object by loading sequences
43
43
  # from the FASTA-format +:file+
44
- def self.stats_from_fasta file
44
+ def self.stats_from20_fasta file
45
45
  a = Assembly.new file
46
46
  a.basic_stats
47
47
  end
48
48
 
49
- def run
50
- stats = self.basic_stats
49
+ def run threads=8
50
+ stats = self.basic_stats threads
51
51
  stats.each_pair do |key, value|
52
- ivar = "@#{key.gsub(/ /, '_')}".to_sym
52
+ ivar = "@#{key.gsub(/\ /, '_')}".to_sym
53
+ attr_ivar = "#{key.gsub(/\ /, '_')}".to_sym
54
+ # creates accessors for the variables in stats
55
+ singleton_class.class_eval { attr_accessor attr_ivar }
53
56
  self.instance_variable_set(ivar, value)
54
57
  end
58
+ @has_run = true
55
59
  end
56
60
 
57
- # Return a hash of statistics about this assembly
58
- def basic_stats
61
+ # Return a hash of statistics about this assembly. Stats are
62
+ # calculated in parallel by splitting the assembly into
63
+ # equal-sized bins and calling Assembly#basic_bin_stat on each
64
+ # bin in a separate thread.
65
+
66
+ def basic_stats threads=8
67
+
68
+ # create a work queue to process contigs in parallel
69
+ queue = Queue.new
70
+
71
+ # split the contigs into equal sized bins, one bin per thread
72
+ binsize = (@assembly.size / threads.to_f).ceil
73
+ @assembly.each_slice(binsize) do |bin|
74
+ queue << bin
75
+ end
76
+
77
+ # a classic threadpool - an Array of threads that allows
78
+ # us to assign work to each thread and then aggregate their
79
+ # results when they are all finished
80
+ threadpool = []
81
+
82
+ # assign one bin of contigs to each thread from the queue.
83
+ # each thread will process its bin of contigs and then wait
84
+ # for the others to finish.
85
+ semaphore = Mutex.new
86
+ stats = []
87
+
88
+ threads.times do
89
+ threadpool << Thread.new do |thread|
90
+ # keep looping until we run out of bins
91
+ until queue.empty?
92
+
93
+ # use non-blocking pop, so an exception is raised
94
+ # when the queue runs dry
95
+ bin = queue.pop(true) rescue nil
96
+ if bin
97
+ # calculate basic stats for the bin, storing them
98
+ # in the current thread so they can be collected
99
+ # in the main thread.
100
+ bin_stats = basic_bin_stats bin
101
+ semaphore.synchronize { stats << bin_stats }
102
+ end
103
+ end
104
+ end
105
+ end
106
+
107
+ # collect the stats calculated in each thread and join
108
+ # the threads to terminate them
109
+ threadpool.each(&:join)
110
+
111
+ # merge the collected stats and return then
112
+ merge_basic_stats stats
113
+
114
+ end # basic_stats
115
+
116
+
117
+ # Calculate basic statistics in an single thread for a bin
118
+ # of contigs.
119
+ #
120
+ # Basic statistics are:
121
+ #
122
+ # - N10, N30, N50, N70, N90
123
+ # - number of contigs >= 1,000 base pairs long
124
+ # - number of contigs >= 10,000 base pairs long
125
+ # - length of the shortest contig
126
+ # - length of the longest contig
127
+ # - number of contigs in the bin
128
+ # - mean contig length
129
+ # - total number of nucleotides in the bin
130
+ # - mean % of contig length covered by the longest ORF
131
+ #
132
+ # @param [Array] bin An array of Bio::Sequence objects
133
+ # representing contigs in the assembly
134
+
135
+ def basic_bin_stats bin
136
+
137
+ # cumulative length is a float so we can divide it
138
+ # accurately later to get the mean length
59
139
  cumulative_length = 0.0
60
- # we'll calculate Nx for all these x
61
- x = [90, 70, 50, 30, 10]
62
- x2 = x.clone
63
- cutoff = x2.pop / 100.0
64
- res = []
140
+
141
+ # we'll calculate Nx for x in [10, 30, 50, 70, 90]
142
+ # to do this we create a stack of the x values and
143
+ # pop the first one to set the first cutoff. when
144
+ # the cutoff is reached we store the nucleotide length and pop
145
+ # the next value to set the next cutoff. we take a copy
146
+ # of the Array so we can use the intact original to collect
147
+ # the results later
148
+ # x = [90, 70, 50, 30, 10]
149
+ # x2 = x.clone
150
+ # cutoff = x2.pop / 100.0
151
+ # res = []
65
152
  n1k = 0
66
153
  n10k = 0
67
154
  orf_length_sum = 0
68
- @assembly.each do |s|
69
- n1k += 1 if s.length > 1_000
70
- n10k += 1 if s.length > 10_000
71
- orf_length_sum += orf_length(s.seq)
72
-
73
- cumulative_length += s.length
74
- if cumulative_length >= @n_bases * cutoff
75
- res << s.length
76
- if x2.empty?
77
- cutoff=1
78
- else
79
- cutoff = x2.pop / 100.0
80
- end
81
- end
155
+
156
+ # sort the contigs in ascending length order
157
+ # and iterate over them
158
+ bin.sort_by! { |c| c.seq.size }
159
+ bin.each do |contig|
160
+
161
+ # increment our long contig counters if this
162
+ # contig is above the thresholds
163
+ n1k += 1 if contig.length > 1_000
164
+ n10k += 1 if contig.length > 10_000
165
+
166
+ # add the length of the longest orf to the
167
+ # running total
168
+ orf_length_sum += orf_length(contig.seq)
169
+
170
+ # increment the cumulative length and check whether the Nx
171
+ # cutoff has been reached. if it has, store the Nx value and
172
+ # get the next cutoff
173
+ cumulative_length += contig.length
174
+ # if cumulative_length >= @n_bases * cutoff
175
+ # res << contig.length
176
+ # if x2.empty?
177
+ # cutoff=1
178
+ # else
179
+ # cutoff = x2.pop / 100.0
180
+ # end
181
+ # end
82
182
  end
83
183
 
184
+ # calculate and return the statistics as a hash
84
185
  mean = cumulative_length / @assembly.size
85
- ns = Hash[x.map { |n| "N#{n}" }.zip(res)]
186
+ # ns = Hash[x.map { |n| "N#{n}" }.zip(res)]
86
187
  {
87
- "n_seqs" => @assembly.size,
88
- "smallest" => @assembly.first.length,
89
- "largest" => @assembly.last.length,
90
- "n_bases" => @n_bases,
188
+ "n_seqs" => bin.size,
189
+ "smallest" => bin.first.length,
190
+ "largest" => bin.last.length,
191
+ "n_bases" => n_bases,
91
192
  "mean_len" => mean,
92
193
  "n_1k" => n1k,
93
194
  "n_10k" => n10k,
94
- "orf percent" => 300*orf_length_sum/(@assembly.size*mean)
95
- }.merge ns
96
- end
195
+ "orf_percent" => 300 * orf_length_sum / (@assembly.size * mean)
196
+ }
197
+ # }.merge ns
198
+
199
+ end # basic_bin_stats
200
+
201
+ def merge_basic_stats stats
202
+ # convert the array of hashes into a hash of arrays
203
+ collect = Hash.new{|h,k| h[k]=[]}
204
+ stats.each_with_object(collect) do |collect, result|
205
+ collect.each{ |k, v| result[k] << v }
206
+ end
207
+ merged = {}
208
+ collect.each_pair do |stat, values|
209
+ if stat == 'orf_percent' || /N[0-9]{2}/ =~ stat
210
+ # store the mean
211
+ merged[stat] = values.inject(:+) / values.size
212
+ elsif stat == 'smallest'
213
+ merged[stat] = values.min
214
+ elsif stat == 'largest'
215
+ merged[stat] = values.max
216
+ else
217
+ # store the sum
218
+ merged[stat] = values.inject(:+)
219
+ end
220
+ end
97
221
 
222
+ merged
223
+
224
+ end # merge_basic_stats
225
+
98
226
  # finds longest orf in a sequence
99
227
  def orf_length sequence
100
228
  longest=0
101
229
  (1..6).each do |frame|
102
230
  translated = Bio::Sequence::NA.new(sequence).translate(frame)
103
- translated.split(/\*/).each do |orf|
231
+ translated.split('*').each do |orf|
104
232
  if orf.length > longest
105
233
  longest=orf.length
106
234
  end
@@ -21,8 +21,8 @@ module Transrate
21
21
  realistic_dist = insertsize + (3 * insertsd)
22
22
  unless File.exists? outputname
23
23
  # construct bowtie command
24
- bowtiecmd = "#{@bowtie2} --very-sensitive-local -p 8 -X #{realistic_dist}" # TODO number of cores should be variable '-p 8'
25
- bowtiecmd += " --no-unal"
24
+ bowtiecmd = "#{@bowtie2} --very-sensitive-local -k 10 -p 8 -X #{realistic_dist}" # TODO number of cores should be variable '-p 8'
25
+ bowtiecmd += " --no-unal --quiet"
26
26
  bowtiecmd += " #{File.basename(file)} -1 #{left}"
27
27
  # paired end?
28
28
  bowtiecmd += " -2 #{right}" if right
@@ -5,7 +5,10 @@ module Transrate
5
5
  class ComparativeMetrics
6
6
 
7
7
  attr_reader :rbh_per_contig
8
+ attr_reader :rbh_per_reference
8
9
  attr_reader :reciprocal_hits
10
+ attr_reader :reference_coverage
11
+ attr_reader :has_run
9
12
 
10
13
  def initialize assembly, reference
11
14
  @assembly = assembly
@@ -18,13 +21,17 @@ module Transrate
18
21
  @ortholog_hit_ratio = self.ortholog_hit_ratio rbu
19
22
  @collapse_factor = self.collapse_factor @ra.r2l_hits
20
23
  @reciprocal_hits = rbu.size
24
+ @rbh_per_reference = @reciprocal_hits.to_f / @reference.size.to_f
25
+ @reference_coverage = @rbh_per_reference * @collapse_factor
21
26
  @rbh_per_contig = @reciprocal_hits.to_f / @assembly.assembly.size.to_f
27
+ @has_run = true
22
28
  end
23
29
 
24
30
  def comp_stats
25
31
  {
26
32
  :reciprocal_hits => @reciprocal_hits,
27
33
  :rbh_per_contig => @rbh_per_contig,
34
+ :rbh_per_reference => @rbh_per_reference,
28
35
  :ortholog_hit_ratio => @ortholog_hit_ratio,
29
36
  :collapse_factor => @collapse_factor
30
37
  }
@@ -4,6 +4,7 @@ module Transrate
4
4
 
5
5
  def self.dimension_reduce(metrics)
6
6
  total = 0
7
+ p metrics
7
8
  metrics.each do |metric|
8
9
  o = metric.origin
9
10
  w = metric.weighting
@@ -15,11 +15,11 @@ module Transrate
15
15
  # in the assembly fastafile
16
16
  def quantify_expression assembly, samfile
17
17
  assembly = assembly.file if assembly.is_a? Assembly
18
- cmd = "#{@express} --no-bias-correct #{assembly} #{samfile}"
18
+ cmd = "#{@express} --no-bias-correct #{File.expand_path assembly} #{File.expand_path samfile}"
19
19
  ex_output = 'results.xprs'
20
20
  fin_output = "#{assembly}_#{ex_output}"
21
21
  unless File.exists? fin_output
22
- `#{cmd}`
22
+ `#{cmd} 2>&1`.split(/\n/)[1..30].join("\n")
23
23
  File.rename(ex_output, fin_output)
24
24
  end
25
25
  expression = {}
@@ -6,7 +6,7 @@ module Transrate
6
6
 
7
7
  def initialize(name, score, origin)
8
8
  @origin = origin
9
- @score = score
9
+ @score = score ? score : (1 - origin)
10
10
  @name = name
11
11
  @weighting = 1
12
12
  end
@@ -5,9 +5,10 @@ module Transrate
5
5
  attr_reader :total
6
6
  attr_reader :bad
7
7
  attr_reader :supported_bridges
8
- attr_reader :pc_good_mapping
8
+ attr_reader :pr_good_mapping
9
9
  attr_reader :percent_mapping
10
- attr_reader :expressed_contigs
10
+ attr_reader :prop_expressed
11
+ attr_reader :has_run
11
12
 
12
13
  def initialize assembly
13
14
  @assembly = assembly
@@ -20,8 +21,10 @@ module Transrate
20
21
  samfile = @mapper.map_reads(@assembly.file, left, right, insertsize, insertsd)
21
22
  self.analyse_read_mappings(samfile, insertsize, insertsd)
22
23
  self.analyse_expression(samfile)
24
+ @pr_good_mapping = @good.to_f / @num_pairs.to_f
23
25
  @percent_mapping = @total.to_f / @num_pairs.to_f * 100.0
24
- @pc_good_mapping = @good.to_f / @num_pairs.to_f * 100.0
26
+ @pc_good_mapping = @pr_good_mapping * 100.0
27
+ @has_run = true
25
28
  end
26
29
 
27
30
  def read_stats
@@ -44,7 +47,8 @@ module Transrate
44
47
  :unrealistic_fragment => @unrealistic_fragment,
45
48
  :potential_bridges => @supported_bridges,
46
49
  :expressed_contigs => @expressed_contigs,
47
- :unexpressed_contigs => @unexpressed_contigs
50
+ :unexpressed_contigs => @unexpressed_contigs,
51
+ :percent_expressed => @percent_expressed
48
52
  }
49
53
  end
50
54
 
@@ -183,6 +187,8 @@ module Transrate
183
187
  @expressed_contigs += 1
184
188
  end
185
189
  end
190
+ @prop_expressed = @expressed_contigs.to_f / @assembly.size
191
+ @percent_expressed = @prop_expressed * 100.0
186
192
  end
187
193
 
188
194
  end # ReadMetrics
@@ -39,6 +39,7 @@ module Transrate
39
39
  reference_db = File.join(reference_dir, reference_base + ".udb")
40
40
  @usearch.makeudb_ublast @reference.file, reference_db
41
41
  @reference.ublast_db = reference_db
42
+ return reference_db
42
43
  end
43
44
  end
44
45
 
@@ -6,24 +6,49 @@ module Transrate
6
6
  attr_reader :read_metrics
7
7
  attr_reader :comparative_metrics
8
8
 
9
- def initialize assembly, reference, left, right, insertsize=nil, insertsd=nil
9
+ def initialize assembly, reference, left=nil, right=nil, insertsize=nil, insertsd=nil
10
10
  @assembly = assembly.is_a?(Assembly) ? assembly : Assembly.new(assembly)
11
11
  @reference = reference.is_a?(Assembly) ? reference : Assembly.new(reference)
12
12
  @read_metrics = ReadMetrics.new @assembly
13
13
  @comparative_metrics = ComparativeMetrics.new(@assembly, @reference)
14
14
  end
15
15
 
16
- def run left, right, insertsize=nil, insertsd=nil
17
- @assembly.run
18
- @read_metrics.run(left, right)
19
- @comparative_metrics.run
16
+ def run left=nil, right=nil, insertsize=nil, insertsd=nil
17
+ assembly_metrics
18
+ if left && right
19
+ read_metrics left, right
20
+ end
21
+ comparative_metrics
20
22
  end
21
23
 
22
24
  def assembly_score
23
- pg = Metric.new('pg', @read_metrics.pc_good_mapping, 0.0)
24
- rbhpc = Metric.new('rbhpc', @comparative_metrics.rbh_per_contig, 0.0)
25
- ec = Metric.new('ec', @read_metrics.expressed_contigs, 0.0)
26
- @score = DimensionReduce.dimension_reduce([pg, rbhpc, ec])
25
+ @score, pg, rc = nil
26
+ if @read_metrics.has_run
27
+ pg = Metric.new('pg', @read_metrics.pr_good_mapping, 0.0)
28
+ end
29
+ if @comparative_metrics.has_run
30
+ rc = Metric.new('rc', @comparative_metrics.reference_coverage,
31
+ 0.0)
32
+ end
33
+ if (pg && rc)
34
+ @score = DimensionReduce.dimension_reduce([pg, rc])
35
+ end
36
+ return @score
37
+ end
38
+
39
+ def assembly_metrics
40
+ @assembly.run unless @assembly.has_run
41
+ @assembly
42
+ end
43
+
44
+ def read_metrics left=nil, right=nil
45
+ @read_metrics.run(left, right) unless @read_metrics.has_run
46
+ @read_metrics
47
+ end
48
+
49
+ def comparative_metrics
50
+ @comparative_metrics.run unless @comparative_metrics.has_run
51
+ @comparative_metrics
27
52
  end
28
53
 
29
54
  def all_metrics left, right, insertsize=nil, insertsd=nil
@@ -42,7 +42,9 @@ module Transrate
42
42
  end
43
43
 
44
44
  def findorfs filepath, output
45
- unless File.exists? output
45
+ if File.exists? output
46
+ puts "skipping ORF finding: ORF file already exists at #{output}"
47
+ else
46
48
  subcmd = " -findorfs #{filepath}"
47
49
  subcmd += " -output #{output}"
48
50
  subcmd += " -xlat"
@@ -53,7 +55,10 @@ module Transrate
53
55
 
54
56
  def run subcmd
55
57
  subcmd += " -quiet"
56
- `#{@cmd}#{subcmd}`
58
+ ret = `#{@cmd}#{subcmd} 2>&1`
59
+ unless $?.exitstatus == 0
60
+ puts "usearch command failed: #{subcmd}\noutput:\n#{ret}"
61
+ end
57
62
  end
58
63
 
59
64
  end # Usearch
@@ -4,7 +4,7 @@ module Transrate
4
4
  module VERSION
5
5
  MAJOR = 0
6
6
  MINOR = 0
7
- PATCH = 10
7
+ PATCH = 12
8
8
  BUILD = nil
9
9
 
10
10
  STRING = [MAJOR, MINOR, PATCH, BUILD].compact.join('.')
@@ -0,0 +1,18 @@
1
+ module Transrate
2
+
3
+ class Writer
4
+
5
+ require 'csv'
6
+
7
+ def self.write name, data
8
+ CSV.open(name, 'wb') do |csv|
9
+ csv << ["metric", "value"]
10
+ data.each_pair do |k, v|
11
+ csv << [k, v]
12
+ end
13
+ end
14
+ end
15
+
16
+ end # Writer
17
+
18
+ end # Transrate
data/test/helper.rb ADDED
@@ -0,0 +1,16 @@
1
+ require 'simplecov'
2
+ require 'coveralls'
3
+
4
+ SimpleCov.formatter = SimpleCov::Formatter::MultiFormatter[
5
+ SimpleCov::Formatter::HTMLFormatter,
6
+ Coveralls::SimpleCov::Formatter
7
+ ]
8
+ SimpleCov.start
9
+
10
+ require 'test/unit'
11
+ begin; require 'turn/autorun'; rescue LoadError; end
12
+ require 'shoulda-context'
13
+ require 'transrate'
14
+
15
+ Turn.config.format = :pretty
16
+ Turn.config.trace = 5
data/transrate.gemspec CHANGED
@@ -7,7 +7,7 @@ Gem::Specification.new do |gem|
7
7
  gem.authors = [ "Richard Smith" ]
8
8
  gem.email = "rds45@cam.ac.uk"
9
9
  gem.licenses = ["MIT"]
10
- gem.homepage = 'https://github.com/blahah/transrate'
10
+ gem.homepage = 'https://github.com/Blahah/transrate'
11
11
  gem.summary = %q{ quality assessment of de-novo transcriptome assemblies }
12
12
  gem.description = %q{ a library and command-line tool for quality assessment of de-novo transcriptome assemblies }
13
13
  gem.version = Transrate::VERSION::STRING.dup
@@ -16,14 +16,14 @@ Gem::Specification.new do |gem|
16
16
  gem.executables = `git ls-files -- bin/*`.split("\n").map{ |f| File.basename(f) }
17
17
  gem.require_paths = %w( lib )
18
18
 
19
- gem.add_dependency 'rake', '~> 10.1.0'
20
- gem.add_dependency 'trollop', '~> 2.0'
19
+ gem.add_dependency 'rake'
20
+ gem.add_dependency 'trollop'
21
21
  gem.add_dependency 'which'
22
22
  gem.add_dependency 'bio'
23
- gem.add_dependency 'bettersam', '~> 0.0.1.alpha'
23
+ gem.add_dependency 'bettersam'
24
24
 
25
25
  gem.add_development_dependency 'turn'
26
26
  gem.add_development_dependency 'simplecov'
27
27
  gem.add_development_dependency 'shoulda-context'
28
- gem.add_development_dependency 'coveralls', '~> 0.6.7'
28
+ gem.add_development_dependency 'coveralls', '>= 0.6.7'
29
29
  end
metadata CHANGED
@@ -1,156 +1,156 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: transrate
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.0.10
4
+ version: 0.0.12
5
5
  platform: ruby
6
6
  authors:
7
7
  - Richard Smith
8
8
  autorequire:
9
9
  bindir: bin
10
10
  cert_chain: []
11
- date: 2013-09-29 00:00:00.000000000 Z
11
+ date: 2014-04-14 00:00:00.000000000 Z
12
12
  dependencies:
13
13
  - !ruby/object:Gem::Dependency
14
14
  name: rake
15
15
  requirement: !ruby/object:Gem::Requirement
16
16
  requirements:
17
- - - ~>
17
+ - - ">="
18
18
  - !ruby/object:Gem::Version
19
- version: 10.1.0
19
+ version: '0'
20
20
  type: :runtime
21
21
  prerelease: false
22
22
  version_requirements: !ruby/object:Gem::Requirement
23
23
  requirements:
24
- - - ~>
24
+ - - ">="
25
25
  - !ruby/object:Gem::Version
26
- version: 10.1.0
26
+ version: '0'
27
27
  - !ruby/object:Gem::Dependency
28
28
  name: trollop
29
29
  requirement: !ruby/object:Gem::Requirement
30
30
  requirements:
31
- - - ~>
31
+ - - ">="
32
32
  - !ruby/object:Gem::Version
33
- version: '2.0'
33
+ version: '0'
34
34
  type: :runtime
35
35
  prerelease: false
36
36
  version_requirements: !ruby/object:Gem::Requirement
37
37
  requirements:
38
- - - ~>
38
+ - - ">="
39
39
  - !ruby/object:Gem::Version
40
- version: '2.0'
40
+ version: '0'
41
41
  - !ruby/object:Gem::Dependency
42
42
  name: which
43
43
  requirement: !ruby/object:Gem::Requirement
44
44
  requirements:
45
- - - '>='
45
+ - - ">="
46
46
  - !ruby/object:Gem::Version
47
47
  version: '0'
48
48
  type: :runtime
49
49
  prerelease: false
50
50
  version_requirements: !ruby/object:Gem::Requirement
51
51
  requirements:
52
- - - '>='
52
+ - - ">="
53
53
  - !ruby/object:Gem::Version
54
54
  version: '0'
55
55
  - !ruby/object:Gem::Dependency
56
56
  name: bio
57
57
  requirement: !ruby/object:Gem::Requirement
58
58
  requirements:
59
- - - '>='
59
+ - - ">="
60
60
  - !ruby/object:Gem::Version
61
61
  version: '0'
62
62
  type: :runtime
63
63
  prerelease: false
64
64
  version_requirements: !ruby/object:Gem::Requirement
65
65
  requirements:
66
- - - '>='
66
+ - - ">="
67
67
  - !ruby/object:Gem::Version
68
68
  version: '0'
69
69
  - !ruby/object:Gem::Dependency
70
70
  name: bettersam
71
71
  requirement: !ruby/object:Gem::Requirement
72
72
  requirements:
73
- - - ~>
73
+ - - ">="
74
74
  - !ruby/object:Gem::Version
75
- version: 0.0.1.alpha
75
+ version: '0'
76
76
  type: :runtime
77
77
  prerelease: false
78
78
  version_requirements: !ruby/object:Gem::Requirement
79
79
  requirements:
80
- - - ~>
80
+ - - ">="
81
81
  - !ruby/object:Gem::Version
82
- version: 0.0.1.alpha
82
+ version: '0'
83
83
  - !ruby/object:Gem::Dependency
84
84
  name: turn
85
85
  requirement: !ruby/object:Gem::Requirement
86
86
  requirements:
87
- - - '>='
87
+ - - ">="
88
88
  - !ruby/object:Gem::Version
89
89
  version: '0'
90
90
  type: :development
91
91
  prerelease: false
92
92
  version_requirements: !ruby/object:Gem::Requirement
93
93
  requirements:
94
- - - '>='
94
+ - - ">="
95
95
  - !ruby/object:Gem::Version
96
96
  version: '0'
97
97
  - !ruby/object:Gem::Dependency
98
98
  name: simplecov
99
99
  requirement: !ruby/object:Gem::Requirement
100
100
  requirements:
101
- - - '>='
101
+ - - ">="
102
102
  - !ruby/object:Gem::Version
103
103
  version: '0'
104
104
  type: :development
105
105
  prerelease: false
106
106
  version_requirements: !ruby/object:Gem::Requirement
107
107
  requirements:
108
- - - '>='
108
+ - - ">="
109
109
  - !ruby/object:Gem::Version
110
110
  version: '0'
111
111
  - !ruby/object:Gem::Dependency
112
112
  name: shoulda-context
113
113
  requirement: !ruby/object:Gem::Requirement
114
114
  requirements:
115
- - - '>='
115
+ - - ">="
116
116
  - !ruby/object:Gem::Version
117
117
  version: '0'
118
118
  type: :development
119
119
  prerelease: false
120
120
  version_requirements: !ruby/object:Gem::Requirement
121
121
  requirements:
122
- - - '>='
122
+ - - ">="
123
123
  - !ruby/object:Gem::Version
124
124
  version: '0'
125
125
  - !ruby/object:Gem::Dependency
126
126
  name: coveralls
127
127
  requirement: !ruby/object:Gem::Requirement
128
128
  requirements:
129
- - - ~>
129
+ - - ">="
130
130
  - !ruby/object:Gem::Version
131
131
  version: 0.6.7
132
132
  type: :development
133
133
  prerelease: false
134
134
  version_requirements: !ruby/object:Gem::Requirement
135
135
  requirements:
136
- - - ~>
136
+ - - ">="
137
137
  - !ruby/object:Gem::Version
138
138
  version: 0.6.7
139
- description: ' a library and command-line tool for quality assessment of de-novo transcriptome
140
- assemblies '
139
+ description: " a library and command-line tool for quality assessment of de-novo transcriptome
140
+ assemblies "
141
141
  email: rds45@cam.ac.uk
142
142
  executables:
143
143
  - transrate
144
144
  extensions: []
145
145
  extra_rdoc_files: []
146
146
  files:
147
- - .gitignore
147
+ - ".gitignore"
148
148
  - Gemfile
149
149
  - LICENSE
150
150
  - README.md
151
+ - Rakefile
151
152
  - bin/transrate
152
153
  - lib/transrate.rb
153
- - lib/transrate/#assembly.rb#
154
154
  - lib/transrate/assembly.rb
155
155
  - lib/transrate/bowtie2.rb
156
156
  - lib/transrate/comparative_metrics.rb
@@ -163,8 +163,10 @@ files:
163
163
  - lib/transrate/transrater.rb
164
164
  - lib/transrate/usearch.rb
165
165
  - lib/transrate/version.rb
166
+ - lib/transrate/writer.rb
167
+ - test/helper.rb
166
168
  - transrate.gemspec
167
- homepage: https://github.com/blahah/transrate
169
+ homepage: https://github.com/Blahah/transrate
168
170
  licenses:
169
171
  - MIT
170
172
  metadata: {}
@@ -174,12 +176,12 @@ require_paths:
174
176
  - lib
175
177
  required_ruby_version: !ruby/object:Gem::Requirement
176
178
  requirements:
177
- - - '>='
179
+ - - ">="
178
180
  - !ruby/object:Gem::Version
179
181
  version: '0'
180
182
  required_rubygems_version: !ruby/object:Gem::Requirement
181
183
  requirements:
182
- - - '>='
184
+ - - ">="
183
185
  - !ruby/object:Gem::Version
184
186
  version: '0'
185
187
  requirements: []
@@ -1,130 +0,0 @@
1
- require 'bio'
2
- require 'bettersam'
3
- require 'csv'
4
- require 'forwardable'
5
-
6
- module Transrate
7
-
8
- class Assembly
9
-
10
- include Enumerable
11
- extend Forwardable
12
- def_delegators :@assembly, :each, :<<
13
-
14
- attr_accessor :ublast_db
15
- attr_accessor :orfs_ublast_db
16
- attr_accessor :protein
17
- attr_reader :assembly
18
-
19
- # number of bases in the assembly
20
- attr_writer :n_bases
21
-
22
- # assembly filename
23
- attr_accessor :file
24
-
25
- # assembly n50
26
- attr_reader :n50
27
-
28
- # Reuturn a new Assembly.
29
- #
30
- # - +:file+ - path to the assembly FASTA file
31
- def initialize file
32
- @file = file
33
- @assembly = []
34
- @n_bases = 0
35
- Bio::FastaFormat.open(file).each do |entry|
36
- @n_bases += entry.length
37
- @assembly << entry
38
- end
39
- @assembly.sort_by! { |x| x.length }
40
- end
41
-
42
- # Return a new Assembly object by loading sequences
43
- # from the FASTA-format +:file+
44
- def self.stats_from_fasta file
45
- a = Assembly.new file
46
- a.basic_stats
47
- end
48
-
49
- def run
50
- stats = self.basic_stats
51
- stats.each_pair do |key, value|
52
- ivar = "@#{key.gsub(/ /, '_')}".to_sym
53
- self.instance_variable_set(ivar, value)
54
- end
55
- end
56
-
57
- # Return a hash of statistics about this assembly
58
- def basic_stats
59
- cumulative_length = 0.0
60
- # we'll calculate Nx for all these x
61
- x = [90, 70, 50, 30, 10]
62
- x2 = x.clone
63
- cutoff = x2.pop / 100.0
64
- res = []
65
- n1k = 0
66
- n10k = 0
67
- orf_length_sum = 0
68
- @assembly.each do |s|
69
- n1k += 1 if s.length > 1_000
70
- n10k += 1 if s.length > 10_000
71
- orf_length_sum += orf_length(s.seq)
72
-
73
- cumulative_length += s.length
74
- if cumulative_length >= @n_bases * cutoff
75
- res << s.length
76
- if x2.empty?
77
- cutoff=1
78
- else
79
- cutoff = x2.pop / 100.0
80
- end
81
- end
82
- end
83
-
84
- mean = cumulative_length / @assembly.size
85
- ns = Hash[x.map { |n| "N#{n}" }.zip(res)]
86
- {
87
- "n_seqs" => @assembly.size,
88
- "smallest" => @assembly.first.length,
89
- "largest" => @assembly.last.length,
90
- "n_bases" => @n_bases,
91
- "mean_len" => mean,
92
- "n_1k" => n1k,
93
- "n_10k" => n10k,
94
- "orf percent" => 300*orf_length_sum/(@assembly.size*mean)
95
- }.merge ns
96
- end
97
-
98
- # finds longest orf in a sequence
99
- def orf_length sequence
100
- longest=0
101
- (1..6).each do |frame|
102
- translated = Bio::Sequence::NA.new(sequence).translate(frame)
103
- translated.split(/\*/).each do |orf|
104
- if orf.length > longest
105
- longest=orf.length
106
- end
107
- end
108
- end
109
- return longest
110
- end
111
-
112
- # return the number of bases in the assembly, calculating
113
- # from the assembly if it hasn't already been done.
114
- def n_bases
115
- unless @n_bases
116
- @n_bases = 0
117
- @assembly.each { |s| @n_bases += s.length }
118
- end
119
- @n_bases
120
- end
121
-
122
- def print_stats
123
- self.basic_stats.map do |k, v|
124
- "#{k}#{" " * (20 - (k.length + v.to_i.to_s.length))}#{v.to_i}"
125
- end.join("\n")
126
- end
127
-
128
- end # Assembly
129
-
130
- end # Transrate