transrate 0.0.1 → 0.0.2

Sign up to get free protection for your applications and to get access to all the features.
data/Gemfile.lock CHANGED
@@ -2,7 +2,7 @@ PATH
2
2
  remote: .
3
3
  specs:
4
4
  transrate (0.0.1)
5
- bettersam
5
+ bettersam (~> 0.0.1.alpha)
6
6
  bio
7
7
  rake (~> 10.1.0)
8
8
  trollop (~> 2.0)
@@ -13,7 +13,7 @@ GEM
13
13
  specs:
14
14
  ansi (1.4.3)
15
15
  bettersam (0.0.1.alpha)
16
- bio (1.4.3)
16
+ bio (1.4.3.0001)
17
17
  colorize (0.5.8)
18
18
  coveralls (0.6.7)
19
19
  colorize
data/README.md CHANGED
@@ -1,8 +1,42 @@
1
1
  Transrate
2
2
  ----
3
3
 
4
- Quality analyis and comparison of transcriptome assemblies.
4
+ Quality analysis and comparison of transcriptome assemblies.
5
5
 
6
+ ## Transcriptome assembly quality metrics
7
+
8
+ **transrate** implements a variety of established and new metrics.
9
+
10
+ note: this list will be expanded soon with detailed explanations and a guide to interpreting the results.
11
+
12
+ ### Contig metrics
13
+
14
+ * **n_seqs** - the number of contigs in the assembly
15
+ * **smallest** - the size of the smallest contig
16
+ * **largest** - the size of the largest contig
17
+ * **n_bases** - the number of bases included in the assembly
18
+ * **mean_len** - the mean length of the contigs
19
+ * **n > 1k** - the number of contigs greater than 1,000 bases long
20
+ * **n > 10k** - the number of contigs greater than 10,000 bases long
21
+ * **nX** - the largest contig size at which at least X% of bases are contained in contigs *longer* than this length
22
+
23
+ ### Read mapping metrics
24
+
25
+ * **total** - the total number of reads pairs mapping
26
+ * **good** - the number of read pairs mapping in a way indicative of good assembly
27
+ * **bad** - the number of reads pairs mapping in a way indicative of bad assembly
28
+
29
+ 'Good' pairs are those where both members are aligned, in the correct orientation, either on the same contig or within a plausible distance of the ends of two separate contigs.
30
+
31
+ Conversely, 'bad' pairs are those where one of the conditions for being 'good' are not met.
32
+
33
+ Additionally, the software calculates whether there is any evidence in the read mappings that different contigs originate from the same transcript. These theoretical links are called bridges, and the number of bridges is shown in the **supported bridges** metric. The list of supported bridges is output to a file, `supported_bridges.csv`, in case you want to make use of the information. At a later date, transrate will include the ability to improve the assembly using this and other information.
34
+
35
+ ### Comparative metrics
36
+
37
+ * **reciprocal hits** - the number of reciprocal best hits against the reference using ublast. A high score indicates that a large number of real transcripts have been assembled.
38
+ * **ortholog hit ratio** - the mean ratio of alignment length to reference sequence length. A low score on this metric indicates the assembly contains full-length transcripts.
39
+ * **collapse factor** - the mean number of reference proteins mapping to each contig. A high score on this metric indicates the assembly contains chimeras.
6
40
 
7
41
  ## Installation
8
42
 
@@ -14,6 +48,55 @@ If that doesn't work, check the requirements below...
14
48
 
15
49
  ## Usage
16
50
 
51
+ `transrate --help` will give you...
52
+
53
+ ```
54
+ Transrate v0.0.1a by Richard Smith <rds45@cam.ac.uk>
55
+
56
+ DESCRIPTION:
57
+ Analyse a de-novo transcriptome
58
+ assembly using three kinds of metrics:
59
+
60
+ 1. contig-based
61
+ 2. read-mapping
62
+ 3. reference-based
63
+
64
+ Please make sure USEARCH and bowtie2 are both installed
65
+ and in the PATH.
66
+
67
+ Bug reports and feature requests at:
68
+ http://github.com/blahah/transrate
69
+
70
+ USAGE:
71
+ transrate <options>
72
+
73
+ OPTIONS:
74
+ --assembly, -a <s>: assembly file in FASTA format
75
+ --reference, -r <s>: reference proteome file in FASTA format
76
+ --left, -l <s>: left reads file in FASTQ format
77
+ --right, -i <s>: right reads file in FASTQ format
78
+ --insertsize, -n <i>: mean insert size (default: 200)
79
+ --insertsd, -s <i>: insert size standard deviation (default: 50)
80
+ --threads, -t <i>: number of threads to use (default: 8)
81
+ --version, -v: Print version and exit
82
+ --help, -h: Show this message
83
+ ```
84
+
85
+ If you don't include --left and --right read files, the read-mapping based analysis will be skipped. I recommend that you don't align all your reads - just a subset of 500,000 will give you a very good idea of the quality. You can get a subset by running (on a linux system):
86
+
87
+ `head -2000000 readfile.fastq`
88
+
89
+ FASTQ records are 4 lines long, so make sure you multiply the number of reads you want by 4, and be sure to run the same command on both the left and right read files.
90
+
91
+ ### Example
92
+
93
+ ```
94
+ transrate --assembly assembly.fasta \
95
+ --reference reference.fasta \
96
+ --left l.fq \
97
+ --right r.fq \
98
+ --threads 4
99
+ ```
17
100
 
18
101
  ## Requirements
19
102
 
@@ -23,7 +106,7 @@ First, you'll need Ruby v1.9.3 or greater installed. You can check with:
23
106
 
24
107
  `ruby --version`
25
108
 
26
- If you don't have Ruby installed, or you need a higher version, I recommend using [RVM](http://rvm.io/) as your Ruby Version Manager. To install RVM along with the latest ruby, just run:
109
+ If you don't have Ruby installed, or you need a higher version, I recommend using [RVM](http://rvm.io/) as your Ruby Version Manager. To install RVM along with the latest Ruby, just run:
27
110
 
28
111
  `\curl -L https://get.rvm.io | bash -s stable`
29
112
 
@@ -35,6 +118,10 @@ Your Ruby installation *should* come with RubyGems, the package manager for Ruby
35
118
 
36
119
  If you don't have it installed, I recommend installing the latest version of Ruby and RubyGems using the RVM instructions above (in the Requirements:Ruby section.
37
120
 
121
+ ### Usearch and Bowtie2
122
+
123
+ Usearch (http://drive5.com/usearch) and Bowtie2 (https://sourceforge.net/projects/bowtie-bio/files/bowtie2) must be installed and in your PATH. Additionally, the Usearch binary executable should be named `usearch`.
124
+
38
125
  ## Development status
39
126
 
40
- This software is in very early development. Nevertheless, we welcome bug reports.
127
+ This software is in very early development. Nevertheless, we welcome bug reports.
data/lib/transrate.rb CHANGED
@@ -1,3 +1,4 @@
1
+ require 'transrate/transrater'
1
2
  require 'transrate/version'
2
3
  require 'transrate/assembly'
3
4
  require 'transrate/bowtie2'
@@ -19,9 +19,12 @@ class Assembly
19
19
  # assembly filename
20
20
  attr_accessor :file
21
21
 
22
+ # assembly n50
23
+ attr_reader :n50
24
+
22
25
  # Reuturn a new Assembly.
23
26
  #
24
- # - +:assembly+ - an array of Bio::Sequences
27
+ # - +:file+ - path to the assembly FASTA file
25
28
  def initialize file
26
29
  @file = file
27
30
  @assembly = []
@@ -40,6 +43,14 @@ class Assembly
40
43
  a.basic_stats
41
44
  end
42
45
 
46
+ def run
47
+ stats = self.basic_stats
48
+ stats.each_pair do |key, value|
49
+ ivar = "@#{key.gsub(/ /, '_')}".to_sym
50
+ self.instance_variable_set(key, value)
51
+ end
52
+ end
53
+
43
54
  # Return a hash of statistics about this assembly
44
55
  def basic_stats
45
56
  cumulative_length = 0.0
@@ -4,6 +4,8 @@ module Transrate
4
4
 
5
5
  class ComparativeMetrics
6
6
 
7
+ attr_accessor :reciprocal_hits
8
+
7
9
  def initialize assembly, reference
8
10
  @assembly = assembly
9
11
  @reference = reference
@@ -12,12 +14,13 @@ module Transrate
12
14
 
13
15
  def run
14
16
  rbu = self.reciprocal_best_ublast
15
- ohr = self.ortholog_hit_ratio rbu
16
- cf = self.collapse_factor @ra.l2r_hits
17
+ @ortholog_hit_ratio = self.ortholog_hit_ratio rbu
18
+ @collapse_factor = self.collapse_factor @ra.l2r_hits
19
+ @reciprocal_hits = rbu.size
17
20
  {
18
- :reciprocal_hits => rbu.size,
19
- :ortholog_hit_ratio => ohr,
20
- :collapse_factor => cf
21
+ :reciprocal_hits => @reciprocal_hits,
22
+ :ortholog_hit_ratio => @ortholog_hit_ratio,
23
+ :collapse_factor => @collapse_factor
21
24
  }
22
25
  end
23
26
 
@@ -27,10 +30,12 @@ module Transrate
27
30
  end
28
31
 
29
32
  def ortholog_hit_ratio rbu
33
+ return @ortholog_hit_ratio unless @ortholog_hit_ratio.nil?
30
34
  rbu.reduce(0.0){ |sum, hit| sum += hit.last.tcov.to_f } / rbu.size
31
35
  end
32
36
 
33
37
  def collapse_factor hits
38
+ return @collapse_factor unless @collapse_factor.nil?
34
39
  targets = {}
35
40
  hits.each_pair do |query, hit|
36
41
  unless targets.has_key? query
@@ -2,6 +2,9 @@ module Transrate
2
2
 
3
3
  class ReadMetrics
4
4
 
5
+ attr_reader :bad
6
+ attr_reader :supported_bridges
7
+
5
8
  def initialize assembly
6
9
  @assembly = assembly
7
10
  @mapper = Bowtie2.new
@@ -0,0 +1,19 @@
1
+ module Transrate
2
+
3
+ class Transrater
4
+
5
+ def initialize path
6
+ @assembly = Assembly.new path
7
+ @read_metrics = ReadMetrics.new @assembly
8
+ @comparative_metrics = ComparativeMetrics.new @assembly
9
+ end
10
+
11
+ def run
12
+ @assembly.run
13
+ @read_metrics.run
14
+ @comparative_metrics.run
15
+ end
16
+
17
+ end # Transrater
18
+
19
+ end # Transrate
@@ -4,7 +4,7 @@ module Transrate
4
4
  module VERSION
5
5
  MAJOR = 0
6
6
  MINOR = 0
7
- PATCH = 1
7
+ PATCH = 2
8
8
  BUILD = nil
9
9
 
10
10
  STRING = [MAJOR, MINOR, PATCH, BUILD].compact.join('.')
metadata CHANGED
@@ -1,7 +1,7 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: transrate
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.0.1
4
+ version: 0.0.2
5
5
  prerelease:
6
6
  platform: ruby
7
7
  authors:
@@ -9,7 +9,7 @@ authors:
9
9
  autorequire:
10
10
  bindir: bin
11
11
  cert_chain: []
12
- date: 2013-09-17 00:00:00.000000000 Z
12
+ date: 2013-09-19 00:00:00.000000000 Z
13
13
  dependencies:
14
14
  - !ruby/object:Gem::Dependency
15
15
  name: rake
@@ -176,6 +176,7 @@ files:
176
176
  - lib/transrate/rb_hit.rb
177
177
  - lib/transrate/read_metrics.rb
178
178
  - lib/transrate/reciprocal_annotation.rb
179
+ - lib/transrate/transrater.rb
179
180
  - lib/transrate/usearch.rb
180
181
  - lib/transrate/version.rb
181
182
  - transrate.gemspec