bio-gfastqc 0.0.1 → 0.0.4

Sign up to get free protection for your applications and to get access to all the features.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA1:
3
- metadata.gz: cac0f88eccf1fbe76912a8a3a0b4fb6a25a436dd
4
- data.tar.gz: 37a6c8e3af76d4c896fb2ae57ff9a34335fa3014
3
+ metadata.gz: ef248148ccf0f2c2ba5d8171ab80eb6e52a30b77
4
+ data.tar.gz: 0689965afa3c8c63a54721718e9e56e86763324d
5
5
  SHA512:
6
- metadata.gz: 44cd149bc4a1d5423b2dedf2fe7fa9da739ef4c3fbbc30e29980e97d6b621b2856230de52aa1069f76dc6c8e90db8e78f0b104badb23de9b9cc7b84439f227bf
7
- data.tar.gz: f97b5e978ebe87225c793ab81e06ddbac514ee80fc9220ee39bf8a418593a0078f422d749d7e440830ec02b0f27147fa0cded93195fe47ffe077ae6957075b89
6
+ metadata.gz: be301904e3135c415db9c65ca330352d3f6ae7364b1c53c0f8aab83ed9045205c9f162440c28aefffedb18db05c0ccaa55c00654762f70f7017c9cd60cb9fbe3
7
+ data.tar.gz: b2b160b23a4f7bd7098eaf7b53c19b64e2da5e5f4d60b42b2aec9b9fa2158e02853704fc561496c65f15216b8a47092591e14bf44bf40912d5f1f08c297be220
data/README.md CHANGED
@@ -2,7 +2,7 @@
2
2
 
3
3
  [![Build Status](https://secure.travis-ci.org/helios/bioruby-gfastqc.png)](http://travis-ci.org/helios/bioruby-gfastqc)
4
4
 
5
- Full description goes here
5
+ Bioinformatics. Aggregate FastQC (quality control for Next Generation Sequencing -NGS-) results from many different samples in a single web page, with charts and tables organized and simplified. The main goal is to speed up the communication process with colleagues (PIs, Biologists, BioInformaticians).
6
6
 
7
7
  Note: this software is under active development!
8
8
 
@@ -20,6 +20,8 @@ require 'bio-gfastqc'
20
20
 
21
21
  The API doc is online. For more code examples see the test files in
22
22
  the source tree.
23
+
24
+ Note: at this time there is not a real API, it will follow.
23
25
 
24
26
  ## Project home page
25
27
 
@@ -43,18 +45,31 @@ it contents is for example:
43
45
 
44
46
  then run the script in the directory of the `config.yml` file and specify the sub directory for each sampel where is located the result of the FASTQC
45
47
 
46
- ruby gfastqc.rb -a R1 -b R2
48
+ gfastqc -a R1 -b R2
47
49
 
48
50
  in case you have the results of FastQC in a sub folder and you want to keep the definition of the sample independen from it, you can use the step option
49
51
 
50
- ruby gfastqc.rb -a R1 -b R2 --step qc_pre_trimming
52
+ gfastqc -a R1 -b R2 --step qc_pre_trimming
51
53
 
52
54
  Then open index.html in your browser
53
55
 
56
+ ### Pipengine
57
+ Pipengine, https://github.com/fstrozzi/bioruby-pipengine, a simple launcher for complex biological pipelines. Because we are developing it we found usefult to reuse some best practices from it. An example is the `-s/--step` options which let you select the sample inner directory from which grab the FastQC results. In the current examples we defined just samples and their absolut path, but following the Pipengine directives it is necessary to define another tag:
58
+
59
+ output: /path/where_the_pipe_engine_data_are_processed_and_saved
60
+
61
+ to reuse that tag from gfastqc the user can simply select the option
62
+
63
+ -p/--pipengine
64
+
65
+ the software will look for the results of single fastqc applied to the different samples in the `output` directory.
66
+
54
67
 
55
68
 
56
69
  ## TODO
57
70
 
71
+ * ~~read output tag from sample config file (YAML)~~
72
+ * ~~add reference to pipengine~~
58
73
  * avoid user to specify -a and -b. By default discover zip files and ordering them define the first and second strand.
59
74
  * ~~package everything as a gem~~
60
75
  * provide better documentation for installing the gem on multiple system (GNU/Linux, OSX, Windows)
data/Rakefile CHANGED
@@ -17,8 +17,8 @@ Jeweler::Tasks.new do |gem|
17
17
  gem.name = "bio-gfastqc"
18
18
  gem.homepage = "http://github.com/helios/bioruby-gfastqc"
19
19
  gem.license = "MIT"
20
- gem.summary = %Q{TODO: one-line summary of your gem}
21
- gem.description = %Q{TODO: longer description of your gem}
20
+ gem.summary = %Q{Aggregate FastQC (quality control for Next Generation Sequencing -NGS-)}
21
+ gem.description = %Q{Bioinformatics. Aggregate FastQC (quality control for Next Generation Sequencing -NGS-) results from many different samples in a single web page, with charts and tables organized and simplified. The main goal is to speed up the communication process with colleagues (PIs, Biologists, BioInformaticians).}
22
22
  gem.email = "ilpuccio.febo@gmail.com"
23
23
  gem.authors = ["Raoul Jean Pierre Bonnal"]
24
24
  # dependencies defined in Gemfile
data/VERSION CHANGED
@@ -1 +1 @@
1
- 0.0.1
1
+ 0.0.4
@@ -40,6 +40,7 @@ require 'bio-gfastqc'
40
40
  # Bio::Log::CLI.trace('info')
41
41
 
42
42
 
43
+
43
44
  class OptHandlingBox
44
45
  def self.parse(args)
45
46
  options = OpenStruct.new
@@ -54,6 +55,16 @@ class OptHandlingBox
54
55
  options.config = config
55
56
  end
56
57
 
58
+ opts.on('-p', '--pipengine',
59
+ 'if you used pipengine to produce the fastqc result is better to use this option. gfastqc will look into the output directory specified in the sample.yml') do
60
+ options.pipengine = true
61
+ end
62
+
63
+ opts.on('-g', '--groups',
64
+ 'if you specified groups in your configuration file, select this option') do
65
+ options.groups = true
66
+ end
67
+
57
68
  opts.on('-s', '--step [STEP]',
58
69
  'choose the step from which extract the fastqc data') do |step|
59
70
  options.step = step
@@ -69,6 +80,11 @@ class OptHandlingBox
69
80
  options.second = second
70
81
  end
71
82
 
83
+ opts.on('-n', '--usename',
84
+ 'use sample name to build the path (maybe later implement placeholding like pipengine') do
85
+ options.usename = true
86
+ end
87
+
72
88
 
73
89
  opts.on_tail("-h", "--help", "Show this message") do
74
90
  puts opts
@@ -81,66 +97,37 @@ class OptHandlingBox
81
97
  end #parse
82
98
  end #class OptGtfSplit
83
99
 
100
+
101
+
102
+
103
+
104
+
105
+
106
+
107
+
108
+
84
109
  options = OptHandlingBox.parse(ARGV)
85
110
 
86
- if !File.exists?(options.config)
87
- puts
88
- puts "Warning: there is not config file #{options.config}"
89
- exit
90
- end
91
- config = YAML.load_file(options.config)
92
- @data = {}
93
- @tables = {}
94
- @base_file_names = []
95
- @base_file_names << options.first #this must be mandatory
96
- @base_file_names << options.second if options.second # this can be optional
97
-
98
-
99
- config['samples'].each_pair do |name, path|
100
- @data[name] = {}
101
- @tables[name] = {}
102
- @base_file_names.each do |base_file_name|
103
- @data[name][base_file_name]={}
104
- @tables[name][base_file_name]={}
105
- file = File.join(path, options.step, "#{base_file_name}_fastqc.zip")
106
- Zip::File.open(file) do |zip_file|
107
- zip_file.glob('*/Images/*.png').each do |entry|
108
- @data[name][base_file_name][File.basename(entry.name,".png")]=Base64.encode64(entry.get_input_stream.read)
109
- end #each entry
110
- zip_file.glob('*/fastqc_data.txt').each do |entry|
111
- entry.get_input_stream.read.scan(/>>(.*?)>>END_MODULE/m).each do |match|
112
- match_data = match.first.split("\n")
113
- field_name, status = match_data[0].split("\t")
114
- begin
115
- header = match_data[1].tr("#","").split("\t")
116
- content = match_data[2..-1].map do |data_row|
117
- data_row.split("\t")
118
- end
119
- # puts field_name
120
- @tables[name][base_file_name][field_name.tr(" ","_")]={ "status" => status,
121
- "header"=> header,
122
- "content" => content
123
- }
124
- rescue
125
- # $stderr.puts match_data.inspect #This is a generic warning to notify the user that this records has no data associated with.
126
- end
127
-
128
- end #match
129
- end #fastq data.txt
130
- end #zip
131
- end #files
132
- end
111
+ @gfastqc = Bio::GFASTQC.new(options)
112
+
113
+
114
+
115
+
116
+
117
+
133
118
  # File.open('test.json','w') do |file|
134
119
  # file.write data.to_json
135
120
  # end
136
121
 
137
- @type_images = %w(kmer_profiles per_sequence_gc_content per_base_sequence_content duplication_levels sequence_length_distribution per_sequence_quality per_base_quality adapter_content per_base_n_content per_tile_quality)
138
- @type_tables = %w(Adapter_Content Basic_Statistics Kmer_Content Per_base_N_content Per_base_sequence_content Per_base_sequence_quality Per_sequence_quality_scores Sequence_Duplication_Levels Sequence_Length_Distribution)
122
+ @gfastqc.type_images = %w(kmer_profiles per_sequence_gc_content per_base_sequence_content duplication_levels sequence_length_distribution per_sequence_quality per_base_quality adapter_content per_base_n_content per_tile_quality)
123
+ @gfastqc.type_tables = %w(Adapter_Content Basic_Statistics Kmer_Content Overrepresented_sequences Per_base_N_content Per_base_sequence_content Per_base_sequence_quality Per_sequence_quality_scores Sequence_Duplication_Levels Sequence_Length_Distribution)
139
124
  # dropped tables to avoid wast of space or not so informative:
140
125
  # Per_sequence_GC_content Overrepresented_sequences Per_tile_sequence_quality
141
126
  # Overrepresented_sequences temporary removed it seems that when this fields pass the test there are no data associated with
142
127
 
143
128
 
129
+ # puts @gfastqc.inspect
130
+
144
131
  erb_file = 'index.html.erb'
145
132
 
146
133
  html_file = File.basename(erb_file, '.erb') #=>"page.html"
@@ -1,3 +1,128 @@
1
1
 
2
- module BioGfastqc
3
- end
2
+ module Bio
3
+ class GFASTQC
4
+ attr_accessor :data, :tables, :config, :output, :base_file_names, :step
5
+ attr_accessor :type_images #An array of names of images that must be reported into the html page
6
+ attr_accessor :type_tables #An array of names of tables that must be reported into the html page
7
+ attr_reader :options
8
+
9
+ def initialize(options=OpenStruct.new)
10
+ @options = options
11
+ @config = YAML.load_file(options.config)
12
+ @data = Hash.new { |hash, key| hash[key] = Hash.new { |ihash, ikey| ihash[ikey] = {} } }
13
+ @tables = Hash.new { |hash, key| hash[key] = Hash.new { |ihash, ikey| ihash[ikey] = {} } }
14
+ @base_file_names = []
15
+ @base_file_names << options.first #this must be mandatory
16
+ @base_file_names << options.second if options.second # this can be optional
17
+ @step = options.step
18
+
19
+ if options.pipengine
20
+ unless @config['resources'] && @config['resources']['output']
21
+ raise "Error: If you selected the compatible option -p/--pipengine, an 'output' tag must occour in your configuration file."
22
+ end
23
+ end #pipengine
24
+
25
+ @output = get_output
26
+
27
+ read_each_sample
28
+
29
+ end #initialize
30
+
31
+ def use_pipengine?
32
+ @options.pipengine
33
+ end
34
+
35
+ def use_groups?
36
+ @options.groups
37
+ end
38
+
39
+ def use_sample_name?
40
+ @options.usename
41
+ end
42
+
43
+ def samples
44
+ @config['samples']
45
+ end
46
+
47
+ protected
48
+
49
+ def process_sample(sample_name, path, group_name='')
50
+
51
+ base_file_names.each do |base_file_name|
52
+ # data[name][base_file_name]={}
53
+ # tables[name][base_file_name]={}
54
+ filename = use_sample_name? ? "#{name}_#{base_file_name}" : base_file_name
55
+ file = File.join(use_pipengine? ? File.join(output, group_name, sample_name) : path, step, "#{filename}_fastqc.zip")
56
+ Zip::File.open(file) do |zip_file|
57
+ zip_file.glob('*/Images/*.png').each do |entry|
58
+ field_name = File.basename(entry.name,".png")
59
+ data[sample_name][base_file_name][field_name]=Base64.encode64(entry.get_input_stream.read)
60
+ end #each entry
61
+ zip_file.glob('*/fastqc_data.txt').each do |entry|
62
+ entry.get_input_stream.read.scan(/>>(.*?)>>END_MODULE/m).each do |match|
63
+ match_data = match.first.split("\n")
64
+ field_name, status = match_data[0].split("\t")
65
+ begin
66
+ if field_name == "Overrepresented sequences" && status == 'pass'
67
+ header = []
68
+ content = []
69
+ else
70
+ header = match_data[1].tr("#","").split("\t")
71
+ content = match_data[2..-1].map do |data_row|
72
+ data_row.split("\t")
73
+ end
74
+ end
75
+ # puts field_name
76
+ # puts header
77
+ field_name = field_name.tr(" ","_")
78
+ tables[sample_name][base_file_name][field_name]={ "status" => status,
79
+ "header"=> header,
80
+ "content" => content
81
+ }
82
+ rescue
83
+ $stderr.puts match.inspect #This is a generic warning to notify the user that this records has no data associated with.
84
+ $stderr.puts field_name
85
+ $stderr.puts status
86
+ end #begin
87
+
88
+ end #match
89
+ end #fastq data.txt
90
+ end #zip
91
+ end #files
92
+
93
+ end #process_sample
94
+
95
+
96
+ def read_each_sample
97
+ if options.groups
98
+ #both are exact the name only iterate on groups and the other only on samples (without groups)
99
+ samples.each_pair do |group_name, sample|
100
+ sample.each_pair do |sample_name, path|
101
+ process_sample(sample_name, path, group_name)
102
+ end #iterate over samples
103
+ end #iterage over groups
104
+ else
105
+ samples.each_pair do |sample_name, path|
106
+ process_sample(sample_name, path)
107
+ end #iterate over samples
108
+ end #groups or not
109
+ end
110
+
111
+
112
+
113
+
114
+ def get_output
115
+ if @config['output']
116
+ @config['output']
117
+ elsif options.pipengine && @config['resources'] && @config['resources']['output']
118
+ @config['resources']['output']
119
+ elsif @config['resources'] && @config['resources']['output']
120
+ @config['resources']['output']
121
+ else
122
+ '.'
123
+ end #output
124
+ end #get_output
125
+
126
+
127
+ end #GFASTQC
128
+ end #Bio
@@ -31,33 +31,68 @@
31
31
 
32
32
  <div role="tabpanel">
33
33
  <ul class="nav nav-tabs" role="tablist" >
34
+ <li role="presentation"><a href="#summaries" aria-controls="summaries" role="tab" data-toggle="tab">Summaries</a></li>
34
35
  <li role="presentation"><a href="#charts" aria-controls="charts" role="tab" data-toggle="tab">Charts</a></li>
35
36
  <li role="presentation"><a href="#tables" aria-controls="tables" role="tab" data-toggle="tab">Tables</a></li>
36
37
  </ul>
37
38
  <div class="tab-content">
39
+ <div role="tabpanel" class="tab-pane" id="summaries">
40
+ <!-- Summaries Start-->
41
+
42
+ <table id="summy" class="display" cellspacing="0" width="100%">
43
+ <thead>
44
+ <tr>
45
+ <% (["source"]+@gfastqc.type_tables).each do |col_name| %>
46
+ <th><%= col_name %></th>
47
+ <% end %>
48
+ </tr>
49
+ </thead>
50
+ <tbody>
51
+ <% @gfastqc.tables.each_pair do |sample_name, reads| %>
52
+ <tr>
53
+ <td><%= sample_name %></td>
54
+ <% @gfastqc.type_tables.each do |col_name| %>
55
+ <td>
56
+ <% reads.each_pair do |reads_name, tables| %>
57
+ <% if tables[col_name]["status"] == 'fail'%>
58
+ <%= "f:#{reads_name}" %>
59
+ <% elsif tables[col_name]["status"] == "warn" %>
60
+ <%= "w:#{reads_name}" %>
61
+ <% end %>
62
+ <% end %>
63
+ </td>
64
+ <% end %>
65
+ </tr>
66
+ <% end %>
67
+ </tbody>
68
+ </table>
69
+
70
+ <!-- Summaries End-->
71
+ </div>
72
+
38
73
 
39
74
  <div role="tabpanel" class="tab-pane" id="charts">
40
75
  <!-- Charts Start-->
41
76
  <div role="tabpanel">
42
77
  <ul class="nav nav-pills" role="tablist" >
43
- <% @type_images.each do |type| %>
78
+ <% @gfastqc.type_images.each do |type| %>
44
79
  <li role="presentation"><a href="#<%= type %>" aria-controls="<%= type %>" role="tab" data-toggle="tab"><%= type %></a></li>
45
80
  <% end %>
46
81
  </ul>
47
82
  <div class="tab-content">
48
- <% @type_images.each do |type| %>
83
+ <% @gfastqc.type_images.each do |type| %>
49
84
  <div role="tabpanel" class="tab-pane" id="<%= type %>">
50
85
  <div class="row">
51
- <% @data.each_pair do |sample_name, pairs| %>
86
+ <% @gfastqc.data.each_pair do |sample_name, pairs| %>
52
87
  <div class="col-xs-6 col-md-6">
53
88
  <div class="caption">
54
89
  <h3><%= sample_name %></h3>
55
- <a href="#" class="thumbnail">
56
- <img data-src="holder.js/300x200" src="data:image/png;base64,<%= pairs[@base_file_names[0]][type] %>" alt="<%= sample_name + ' ' + @base_file_names[0] %>">
57
- </a>
58
- <a href="#" class="thumbnail">
59
- <img data-src="holder.js/300x200" src="data:image/png;base64,<%= pairs[@base_file_names[1]][type] %>" alt="<%= sample_name + ' ' + @base_file_names[1] %>">
90
+ <% pairs.each_pair do |pair, images| %>
91
+ <h4><%= pair %></h4>
92
+ <a href="#" class="thumbnail">
93
+ <img data-src="holder.js/300x200" src="data:image/png;base64,<%= pairs[pair][type] %>" alt="<%= sample_name + ' ' + pair %>">
60
94
  </a>
95
+ <% end %>
61
96
  </div>
62
97
  </div>
63
98
  <% end %>
@@ -73,17 +108,19 @@
73
108
  <!-- Tables Start-->
74
109
  <div role="tabpanel">
75
110
  <ul class="nav nav-pills" role="tablist" >
76
- <% @type_tables.each do |type| %>
111
+ <% @gfastqc.type_tables.each do |type| %>
77
112
  <li role="presentation"><a href="#<%= type %>_div_table" aria-controls="<%= type %>_div_table" role="tab" data-toggle="tab"><%= type %></a></li>
78
113
  <% end %>
79
114
  </ul>
80
115
  <div class="tab-content">
81
116
 
82
- <% @type_tables.each do |type| %>
117
+ <% @gfastqc.type_tables.each do |type| %>
83
118
  <div role="tabpanel" class="tab-pane" id="<%= type %>_div_table">
84
119
  <div class="row">
85
- <% @tables.each_pair do |sample_name, reads| %>
120
+ <% @gfastqc.tables.each_pair do |sample_name, reads| %>
86
121
  <% reads.each_pair do |reads_name, tables| %>
122
+ <% unless tables[type]['header'].empty? %>
123
+ <h3><%= sample_name + "_" + reads_name %></h3>
87
124
  <table id="<%= type %>_<%= sample_name %>_<%= reads_name %>_table" class="display" cellspacing="0" width="100%">
88
125
  <thead>
89
126
  <tr>
@@ -102,6 +139,9 @@
102
139
  <% end %>
103
140
  </tbody>
104
141
  </table>
142
+ <% else %>
143
+ <h3><%= sample_name + "_" + reads_name %> has no over represented sequences.</h3>
144
+ <% end %>
105
145
  <% end %>
106
146
  <% end %>
107
147
  </div>
@@ -109,7 +149,7 @@
109
149
  <% end %>
110
150
  </div>
111
151
  </div>
112
- <!-- Charts End-->
152
+ <!-- Tables End-->
113
153
  </div>
114
154
 
115
155
 
@@ -125,13 +165,16 @@
125
165
  </body>
126
166
  <script type="text/javascript">
127
167
  $(document).ready(function() {
128
- <% @type_tables.each do |type| %>
129
- <% @tables.each_pair do |sample_name, reads| %>
168
+ $("#summy").DataTable();
169
+ <% @gfastqc.type_tables.each do |type| %>
170
+ <% @gfastqc.tables.each_pair do |sample_name, reads| %>
130
171
  <% reads.each_pair do |reads_name, tables| %>
131
- $("#<%= type %>_<%= sample_name %>_<%= reads_name %>_table").DataTable();
172
+ $("#"+"<%= type %>_<%= sample_name %>_<%= reads_name %>_table").DataTable();
173
+
132
174
  <% end %>
133
175
  <% end %>
134
176
  <% end %>
177
+
135
178
  } );
136
179
  </script>
137
180
 
metadata CHANGED
@@ -1,14 +1,14 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: bio-gfastqc
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.0.1
4
+ version: 0.0.4
5
5
  platform: ruby
6
6
  authors:
7
7
  - Raoul Jean Pierre Bonnal
8
8
  autorequire:
9
9
  bindir: bin
10
10
  cert_chain: []
11
- date: 2015-03-22 00:00:00.000000000 Z
11
+ date: 2015-06-26 00:00:00.000000000 Z
12
12
  dependencies:
13
13
  - !ruby/object:Gem::Dependency
14
14
  name: rubyzip
@@ -111,7 +111,7 @@ dependencies:
111
111
  description: Bioinformatics. Aggregate FastQC (quality control for Next Generation
112
112
  Sequencing -NGS-) results from many different samples in a single web page, with
113
113
  charts and tables organized and simplified. The main goal is to speed up the communication
114
- process with out colleagues (PIs, Biologists, BioInformaticians).
114
+ process with colleagues (PIs, Biologists, BioInformaticians).
115
115
  email: ilpuccio.febo@gmail.com
116
116
  executables:
117
117
  - gfastqc
@@ -157,8 +157,5 @@ rubyforge_project:
157
157
  rubygems_version: 2.4.3
158
158
  signing_key:
159
159
  specification_version: 4
160
- summary: Bioinformatics. Aggregate FastQC (quality control for Next Generation Sequencing
161
- -NGS-) results from many different samples in a single web page, with charts and
162
- tables organized and simplified. The main goal is to speed up the communication
163
- process with out colleagues (PIs, Biologists, BioInformaticians).
160
+ summary: Aggregate FastQC (quality control for Next Generation Sequencing -NGS-)
164
161
  test_files: []