bio-pipengine 0.6.0 → 0.8.0
Sign up to get free protection for your applications and to get access to all the features.
- checksums.yaml +4 -4
- data/README.md +56 -1
- data/VERSION +1 -1
- data/bin/pipengine +42 -37
- data/lib/bio-pipengine.rb +3 -2
- data/lib/bio/pipengine.rb +76 -67
- data/lib/bio/pipengine/job.rb +21 -28
- data/lib/bio/pipengine/sample.rb +24 -3
- metadata +11 -64
checksums.yaml
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
---
|
2
2
|
SHA1:
|
3
|
-
metadata.gz:
|
4
|
-
data.tar.gz:
|
3
|
+
metadata.gz: 7648de07e1bf263a59454e76be6d0c1128c863bf
|
4
|
+
data.tar.gz: 55b6a9a321c9912c8d0eec4dbd58eea86bbcfb20
|
5
5
|
SHA512:
|
6
|
-
metadata.gz:
|
7
|
-
data.tar.gz:
|
6
|
+
metadata.gz: d6734cc44bc8651dbdb4767834c98888962d0d5d6b4fe4dd2e05be8e145269d997ad576a25afc32273278c9ddaea9e06446179bbb389776c3895f9454e94e963
|
7
|
+
data.tar.gz: d8e99755c20869a356dbda09577a47ec52b9a58e4e16817551a74c197052d4a9ce663c07fd68fdee944f279fae9e8580a564cdd602eee86fa3487c62c2940938
|
data/README.md
CHANGED
@@ -7,6 +7,26 @@ PipEngine will generate runnable shell scripts, already configured for the PBS/T
|
|
7
7
|
|
8
8
|
PipEngine is best suited for NGS pipelines, but it can be used for any kind of pipeline that can be runned on a job scheduling system.
|
9
9
|
|
10
|
+
Installation
|
11
|
+
============
|
12
|
+
|
13
|
+
If you already have Ruby, just install PipEngine using RubyGems:
|
14
|
+
|
15
|
+
```shell
|
16
|
+
gem install bio-pipengine
|
17
|
+
```
|
18
|
+
|
19
|
+
If you don't have Ruby installed, first follow this simple step to get it:
|
20
|
+
|
21
|
+
```shell
|
22
|
+
curl -sSL https://get.rvm.io | bash -s stable
|
23
|
+
```
|
24
|
+
|
25
|
+
and then install PipEngine using RubyGems:
|
26
|
+
|
27
|
+
```shell
|
28
|
+
gem install bio-pipengine
|
29
|
+
```
|
10
30
|
|
11
31
|
:: Topics ::
|
12
32
|
============
|
@@ -85,6 +105,8 @@ With this mode, PipEngine will submit pipeline jobs to the scheduler.
|
|
85
105
|
(the order matters)
|
86
106
|
--group, -g <s>: Specify the group of samples to run the
|
87
107
|
pipeline steps on (do not specify --multi)
|
108
|
+
--allgroups -a: Apply the step(s) to all the groups defined into
|
109
|
+
the samples file
|
88
110
|
--name, -n <s>: Analysis name
|
89
111
|
--output-dir, -o <s>: Output directory (override standard output
|
90
112
|
directory names)
|
@@ -403,6 +425,39 @@ echo '<multi1>' | sed -e 's/,/ /g' | xargs ls >> gtf_list.txt
|
|
403
425
|
|
404
426
|
This line generates the input file for Cuffcompare with the list of the transcripts.gtf files for each sample, generated using the 'multi' definition in the pipeline YAML and the line passed through the **-m** parameter, but getting rid of the commas that separate sample names. It's a workaround and it's not a super clean solution, but PipEngine wants to be a general tool not binded to specific corner cases and it always lets the user define it's own custom command lines to manage particular steps, as in this case.
|
405
427
|
|
428
|
+
Composable & Modular steps definition
|
429
|
+
------------------------------------
|
430
|
+
|
431
|
+
Since now steps are defined inside a single YAML file. This approach is usefult to have a stable and reproducible analysis pipeline. But what if, multiple users whant to collaborate on the same pipeline improving it and, most importantly, re-using the same steps in different analyses ? What happend is a proliferation of highly similar pipelines that are very complicate to compare and to maintain over time.
|
432
|
+
In this scenario, the very first thing that a developer imagine is the ability to include external files, unfortunately YAML does not implement this feature. A possible workaround, remember that we are in the Ruby land, is to embed some Ruby code into the YAML file and include external steps.
|
433
|
+
|
434
|
+
Creating a file `mapping.yml` that describe the mapping step with BWA
|
435
|
+
|
436
|
+
```
|
437
|
+
mapping:
|
438
|
+
cpu: 8
|
439
|
+
desc: Run BWA MEM and generates a sorted BAM file
|
440
|
+
run:
|
441
|
+
- <bwa> mem -t <cpu> -R '@RG\tID:<flowcell>\tLB:<sample>\tPL:ILLUMINA\tPU:<flowcell>\tCN:PTP\tSM:<sample>' <index> <trim/sample>.trim.fastq | <samtools> view -bS - > <sample>.bam
|
442
|
+
- <samtools> sort -@ <cpu> <sample>.bam <sample>.sort
|
443
|
+
- rm -f <sample>.bam
|
444
|
+
```
|
445
|
+
|
446
|
+
is then possible to include the `mapping.yml` file inside your pipeline with a snipped of Ruby code `<%= include :name_of_the_step, "file_step.yml" %>
|
447
|
+
Right now is very important that you place the tag at the very first start of the line ( no spaces at the beginning of the line)
|
448
|
+
|
449
|
+
```
|
450
|
+
steps:
|
451
|
+
<%= include :mapping, "./mapping.yml" %>
|
452
|
+
|
453
|
+
index:
|
454
|
+
desc: Make BAM index
|
455
|
+
run: <samtools> index <mapping/sample>.sort.bam
|
456
|
+
````
|
457
|
+
|
458
|
+
are later run pipengine as usual.
|
459
|
+
TODO: Dump the whole pipeline file for reproducibility purposes.
|
460
|
+
|
406
461
|
|
407
462
|
:: What happens at run-time ::
|
408
463
|
==============================
|
@@ -628,4 +683,4 @@ If a specific queue needs to be selected for sending the jobs to PBS, the ```--p
|
|
628
683
|
Copyright
|
629
684
|
=========
|
630
685
|
|
631
|
-
(c)2013 Francesco Strozzi
|
686
|
+
(c)2013 Francesco Strozzi, Raoul Jean Pierre Bonnal
|
data/VERSION
CHANGED
@@ -1 +1 @@
|
|
1
|
-
0.
|
1
|
+
0.8.0
|
data/bin/pipengine
CHANGED
@@ -3,12 +3,12 @@
|
|
3
3
|
$:<< File.expand_path(File.join(File.dirname(File.dirname __FILE__),"lib"))
|
4
4
|
require 'bio-pipengine'
|
5
5
|
|
6
|
-
banner_text = "\nLauncher for Complex Biological Pipelines . Copyright(C) 2013 Francesco Strozzi\n\n"
|
6
|
+
banner_text = "\nLauncher for Complex Biological Pipelines . Copyright(C) 2013 Francesco Strozzi, Raoul Jean Pierre Bonnal\n\n"
|
7
7
|
version_text = File.read File.expand_path(File.join(File.dirname(File.dirname __FILE__),"VERSION"))
|
8
8
|
SUB_COMMANDS = %w(run jobs)
|
9
9
|
|
10
10
|
|
11
|
-
Bio::Pipengine.check_config
|
11
|
+
#Bio::Pipengine.check_config
|
12
12
|
|
13
13
|
options = {}
|
14
14
|
cmd = ARGV.first # get the subcommand
|
@@ -22,29 +22,27 @@ when "run"
|
|
22
22
|
opt :samples, "List of sample names to run the pipeline", :type => :strings, :short => "l"
|
23
23
|
opt :steps, "List of steps to be executed", :type => :strings, :short => "s"
|
24
24
|
opt :dry,"Dry run. Just create the job script without submitting it to the batch system", :short => "d"
|
25
|
-
opt :spooler,"Destination spooler PBS, plain shell script", :short =>"x", :type => :string, :default => "pbs"
|
26
25
|
opt :tmp, "Temporary output folder", :type => :string, :short => "t"
|
27
26
|
opt :create_samples, "Create samples.yml file from a Sample directory (only for CASAVA projects)", :short => "c", :type => :strings
|
28
27
|
opt :multi, "List of samples to be processed by a given step (the order matters)", :short => "m", :type => :strings
|
29
28
|
opt :group, "Specify the group of samples to run the pipeline steps on (do not specify --multi)", :short => "g", :type => :string
|
29
|
+
opt :allgroups, "Apply the step(s) to all the groups defined into the samples file", :short => "a"
|
30
30
|
opt :name, "Analysis name", :short => "n", :type => :string
|
31
31
|
opt :output_dir, "Output directory (override standard output directory names)", :short => "o", :type => :string
|
32
32
|
opt :pbs_opts, "PBS options", :type => :strings, :short => "b"
|
33
33
|
opt :pbs_queue, "PBS queue", :type => :string, :short => "q"
|
34
34
|
opt :inspect_pipeline, "Show steps", :short => "i", :type => :string
|
35
|
-
opt :mail_exit, "Send an Email when the job terminates", :type => :string
|
36
|
-
opt :mail_start, "Send an Email when the job starts", :type => :string
|
37
35
|
opt :log, "Log script activities, by default stdin. Options are fluentd", :type => :string, :default => "stdin"
|
38
36
|
opt :log_adapter, "(stdin|syslog|fluentd) In case of fluentd use http://destination.hostname:port/yourtag", :type => :string
|
39
37
|
end
|
40
|
-
when "jobs"
|
41
|
-
ARGV.shift
|
42
|
-
options[:jobs] = true
|
43
|
-
Trollop::options do
|
44
|
-
opt :job_id, "Search submitted jobs by Job ID", :type => :strings, :short => "i"
|
45
|
-
opt :job_name, "Search submitted jobs by Job Name", :type => :strings, :short => "n"
|
46
|
-
opt :delete, "Delete submitted jobs ('all' to erase everything or type one or more job IDs)", :short => "d", :type => :strings
|
47
|
-
end
|
38
|
+
#when "jobs"
|
39
|
+
# ARGV.shift
|
40
|
+
# options[:jobs] = true
|
41
|
+
# Trollop::options do
|
42
|
+
# opt :job_id, "Search submitted jobs by Job ID", :type => :strings, :short => "i"
|
43
|
+
# opt :job_name, "Search submitted jobs by Job Name", :type => :strings, :short => "n"
|
44
|
+
# opt :delete, "Delete submitted jobs ('all' to erase everything or type one or more job IDs)", :short => "d", :type => :strings
|
45
|
+
# end
|
48
46
|
when "-h"
|
49
47
|
puts banner_text
|
50
48
|
puts "List of available commands:\n\trun\tSubmit pipelines to the job scheduler\n\tjobs\tShow statistics and interact with running jobs\n"
|
@@ -63,29 +61,29 @@ Trollop::die :multi, "Specifing both --group and --multi is not allowed" if opti
|
|
63
61
|
|
64
62
|
if options[:create_samples]
|
65
63
|
Bio::Pipengine.create_samples options[:create_samples]
|
66
|
-
elsif options[:jobs]
|
67
|
-
if options[:job_id]
|
68
|
-
Bio::Pipengine.show_stats(options[:job_id])
|
69
|
-
elsif options[:job_name]
|
70
|
-
warn "Not yet implemented"
|
71
|
-
exit
|
72
|
-
elsif options[:delete]
|
73
|
-
if options[:delete].empty?
|
74
|
-
warn "Provide one or more Job IDs or write 'all' to delete all your running jobs".red
|
75
|
-
exit
|
76
|
-
end
|
77
|
-
puts "Warning: this will delete the following running jobs: ".light_blue + "#{options[:delete].join(",")}".green
|
78
|
-
print "Are you sure? (y|n):"
|
79
|
-
answer = gets.chomp
|
80
|
-
if answer == "y"
|
81
|
-
Bio::Pipengine.delete_jobs(options[:delete])
|
82
|
-
else
|
83
|
-
puts "Aborting..."
|
84
|
-
exit
|
85
|
-
end
|
86
|
-
else
|
87
|
-
Bio::Pipengine.show_stats(["all"])
|
88
|
-
end
|
64
|
+
#elsif options[:jobs]
|
65
|
+
# if options[:job_id]
|
66
|
+
# Bio::Pipengine.show_stats(options[:job_id])
|
67
|
+
# elsif options[:job_name]
|
68
|
+
# warn "Not yet implemented"
|
69
|
+
# exit
|
70
|
+
# elsif options[:delete]
|
71
|
+
# if options[:delete].empty?
|
72
|
+
# warn "Provide one or more Job IDs or write 'all' to delete all your running jobs".red
|
73
|
+
# exit
|
74
|
+
# end
|
75
|
+
# puts "Warning: this will delete the following running jobs: ".light_blue + "#{options[:delete].join(",")}".green
|
76
|
+
# print "Are you sure? (y|n):"
|
77
|
+
# answer = gets.chomp
|
78
|
+
# if answer == "y"
|
79
|
+
# Bio::Pipengine.delete_jobs(options[:delete])
|
80
|
+
# else
|
81
|
+
# puts "Aborting..."
|
82
|
+
# exit
|
83
|
+
# end
|
84
|
+
# else
|
85
|
+
# Bio::Pipengine.show_stats(["all"])
|
86
|
+
# end
|
89
87
|
elsif options[:pipeline] && options[:samples_file]
|
90
88
|
if options[:inspect_pipeline]
|
91
89
|
Bio::Pipengine.inspect_steps(options[:inspect_pipeline])
|
@@ -94,7 +92,14 @@ elsif options[:pipeline] && options[:samples_file]
|
|
94
92
|
abort("File not found: #{options[:pipeline]}".red) unless File.exists? options[:pipeline]
|
95
93
|
abort("File not found: #{options[:samples_file]}".red) unless File.exists? options[:samples_file]
|
96
94
|
abort("Please provide a valid step name with the --step parameter".red) unless options[:steps]
|
97
|
-
|
95
|
+
if options[:allgroups]
|
96
|
+
Bio::Pipengine.load_samples_file(options[:samples_file])["samples"].keys.each do |group|
|
97
|
+
options[:group] = group
|
98
|
+
Bio::Pipengine.run(options)
|
99
|
+
end
|
100
|
+
else
|
101
|
+
Bio::Pipengine.run(options)
|
102
|
+
end
|
98
103
|
end
|
99
104
|
end
|
100
105
|
|
data/lib/bio-pipengine.rb
CHANGED
@@ -3,9 +3,10 @@ require 'yaml'
|
|
3
3
|
require 'securerandom'
|
4
4
|
require 'trollop'
|
5
5
|
require 'colorize'
|
6
|
-
require 'torque_rm'
|
7
|
-
require 'terminal-table'
|
6
|
+
#require 'torque_rm'
|
7
|
+
#require 'terminal-table'
|
8
8
|
require 'fileutils'
|
9
|
+
require 'logger'
|
9
10
|
|
10
11
|
require 'bio/pipengine/sample'
|
11
12
|
require 'bio/pipengine/step'
|
data/lib/bio/pipengine.rb
CHANGED
@@ -1,22 +1,16 @@
|
|
1
1
|
module Bio
|
2
2
|
module Pipengine
|
3
|
+
|
4
|
+
def self.include(name, filename)
|
5
|
+
File.readlines(filename).map {|line| " "+line}.join("\n")
|
6
|
+
end
|
3
7
|
|
8
|
+
@@logger_error = Logger.new(STDERR)
|
4
9
|
def self.run(options)
|
5
10
|
|
6
11
|
# reading the yaml files
|
7
|
-
pipeline = YAML.
|
8
|
-
samples_file =
|
9
|
-
samples_file["samples"].each do |k,v|
|
10
|
-
if v.kind_of? Hash
|
11
|
-
samples_file["samples"][k] = Hash[samples_file["samples"][k].map{ |key, value| [key.to_s, value.to_s] }]
|
12
|
-
else
|
13
|
-
samples_file["samples"][k] = v.to_s
|
14
|
-
end
|
15
|
-
end
|
16
|
-
# make sure everything in Samples and Resources is converted to string
|
17
|
-
#samples_file["samples"] = Hash[samples_file["samples"].map{ |key, value| [key.to_s, value.to_s] }]
|
18
|
-
samples_file["resources"] = Hash[samples_file["resources"].map {|k,v| [k.to_s, v.to_s]}]
|
19
|
-
|
12
|
+
pipeline = YAML.load ERB.new(File.read(options[:pipeline])).result(binding)
|
13
|
+
samples_file = load_samples_file options[:samples_file]
|
20
14
|
# pre-running checks
|
21
15
|
check_steps(options[:steps],pipeline)
|
22
16
|
check_samples(options[:samples],samples_file) if options[:samples]
|
@@ -50,7 +44,7 @@ module Bio
|
|
50
44
|
|
51
45
|
unless run_multi # there are no multi-samples steps, so iterate on samples and create one job per sample
|
52
46
|
samples_list.each_key do |sample_name|
|
53
|
-
sample = Bio::Pipengine::Sample.new(sample_name,samples_list[sample_name])
|
47
|
+
sample = Bio::Pipengine::Sample.new(sample_name,samples_list[sample_name],options[:group])
|
54
48
|
create_job(samples_file,pipeline,samples_list,options,sample)
|
55
49
|
end
|
56
50
|
end
|
@@ -62,14 +56,14 @@ module Bio
|
|
62
56
|
|
63
57
|
if step_multi.include? false
|
64
58
|
if step_multi.uniq.size > 1
|
65
|
-
|
59
|
+
@@logger_error.error "\nAbort! You are trying to run both multi-samples and single sample steps in the same job".red
|
66
60
|
exit
|
67
61
|
else
|
68
62
|
return false
|
69
63
|
end
|
70
64
|
else
|
71
65
|
samples_obj = {}
|
72
|
-
samples_list.each_key {|sample_name| samples_obj[sample_name] = Bio::Pipengine::Sample.new(sample_name,samples_list[sample_name])}
|
66
|
+
samples_list.each_key {|sample_name| samples_obj[sample_name] = Bio::Pipengine::Sample.new(sample_name,samples_list[sample_name],options[:group])}
|
73
67
|
create_job(samples_file,pipeline,samples_list,options,samples_obj)
|
74
68
|
return true
|
75
69
|
end
|
@@ -106,13 +100,12 @@ module Bio
|
|
106
100
|
self.add_job(job, pipeline, step_name, sample)
|
107
101
|
end
|
108
102
|
|
109
|
-
if options[:dry]
|
103
|
+
if options[:dry]
|
110
104
|
job.to_script(options)
|
111
105
|
else
|
112
|
-
|
113
|
-
|
114
|
-
|
115
|
-
end
|
106
|
+
job.to_script(options)
|
107
|
+
job.submit
|
108
|
+
end
|
116
109
|
end
|
117
110
|
|
118
111
|
# check if sample exists
|
@@ -127,7 +120,7 @@ module Bio
|
|
127
120
|
end
|
128
121
|
end
|
129
122
|
unless samples_names.include? sample
|
130
|
-
|
123
|
+
@@logger_error.error "Sample \"#{sample}\" does not exist in sample file!".red
|
131
124
|
exit
|
132
125
|
end
|
133
126
|
end
|
@@ -137,7 +130,7 @@ module Bio
|
|
137
130
|
def self.check_steps(passed_steps,pipeline)
|
138
131
|
passed_steps.each do |step|
|
139
132
|
unless pipeline["steps"].keys.include? step
|
140
|
-
|
133
|
+
@@logger_error.error "Step \"#{step}\" does not exist in pipeline file!".red
|
141
134
|
exit
|
142
135
|
end
|
143
136
|
end
|
@@ -177,58 +170,74 @@ module Bio
|
|
177
170
|
end
|
178
171
|
end
|
179
172
|
|
180
|
-
# show running jobs information
|
181
|
-
def self.show_stats(job_ids)
|
182
|
-
stats = TORQUE::Qstat.new
|
183
|
-
if job_ids.first == "all"
|
184
|
-
stats.display
|
185
|
-
else
|
186
|
-
stats.display(:job_ids => job_ids)
|
187
|
-
end
|
188
|
-
end
|
189
|
-
|
190
|
-
# delete running jobs from the scheduler
|
191
|
-
def self.delete_jobs(job_ids)
|
192
|
-
include TORQUE
|
193
|
-
if job_ids == ["all"]
|
194
|
-
Qdel.rm_all
|
195
|
-
else
|
196
|
-
job_ids.each {|job_id| Qdel.rm job_id}
|
197
|
-
end
|
198
|
-
end #delete_jobs
|
173
|
+
# # show running jobs information
|
174
|
+
# def self.show_stats(job_ids)
|
175
|
+
# stats = TORQUE::Qstat.new
|
176
|
+
# if job_ids.first == "all"
|
177
|
+
# stats.display
|
178
|
+
# else
|
179
|
+
# stats.display(:job_ids => job_ids)
|
180
|
+
# end
|
181
|
+
# end
|
182
|
+
#
|
183
|
+
# # delete running jobs from the scheduler
|
184
|
+
# def self.delete_jobs(job_ids)
|
185
|
+
# include TORQUE
|
186
|
+
# if job_ids == ["all"]
|
187
|
+
# Qdel.rm_all
|
188
|
+
# else
|
189
|
+
# job_ids.each {|job_id| Qdel.rm job_id}
|
190
|
+
# end
|
191
|
+
# end #delete_jobs
|
199
192
|
|
200
193
|
# check if required configuration exists
|
201
|
-
def self.check_config
|
202
|
-
unless File.exists?("#{Dir.home}/.torque_rm.yaml")
|
203
|
-
ARGV.clear
|
204
|
-
current_user = Etc.getlogin
|
205
|
-
puts "\nIt seems you are running PipEngine for the first time. Please fill in the following information:"
|
206
|
-
print "\nHostname or IP address of authorized server from where jobs will be submitted: ".light_blue
|
207
|
-
server = gets.chomp
|
208
|
-
print "\n"
|
209
|
-
print "Specify the username you will be using to connect and submit jobs [#{current_user}]: ".light_blue
|
210
|
-
username = gets.chomp
|
211
|
-
username = (username == "") ? current_user : username
|
212
|
-
puts "Attempting connection to the server...".green
|
213
|
-
path = `ssh #{username}@#{server} -t "which qsub"`.split("/qsub").first
|
214
|
-
unless path=~/\/\S+\/\S+/
|
215
|
-
warn "Connection problems detected! Please check that you are able to connect to '#{server}' as '#{username}' via ssh.".red
|
216
|
-
else
|
217
|
-
file = File.open("#{Dir.home}/.torque_rm.yaml","w")
|
218
|
-
file.write({:hostname => server, :path => path, :user => username}.to_yaml)
|
219
|
-
file.close
|
220
|
-
puts "First time configuration completed!".green
|
221
|
-
puts "It is strongly recommended to setup a password-less SSH connection to use PipEngine.".green
|
222
|
-
exit
|
223
|
-
end
|
224
|
-
end
|
225
|
-
end #check_config
|
194
|
+
# def self.check_config
|
195
|
+
# unless File.exists?("#{Dir.home}/.torque_rm.yaml")
|
196
|
+
# ARGV.clear
|
197
|
+
# current_user = Etc.getlogin
|
198
|
+
# puts "\nIt seems you are running PipEngine for the first time. Please fill in the following information:"
|
199
|
+
# print "\nHostname or IP address of authorized server from where jobs will be submitted: ".light_blue
|
200
|
+
# server = gets.chomp
|
201
|
+
# print "\n"
|
202
|
+
# print "Specify the username you will be using to connect and submit jobs [#{current_user}]: ".light_blue
|
203
|
+
# username = gets.chomp
|
204
|
+
# username = (username == "") ? current_user : username
|
205
|
+
# puts "Attempting connection to the server...".green
|
206
|
+
# path = `ssh #{username}@#{server} -t "which qsub"`.split("/qsub").first
|
207
|
+
# unless path=~/\/\S+\/\S+/
|
208
|
+
# warn "Connection problems detected! Please check that you are able to connect to '#{server}' as '#{username}' via ssh.".red
|
209
|
+
# else
|
210
|
+
# file = File.open("#{Dir.home}/.torque_rm.yaml","w")
|
211
|
+
# file.write({:hostname => server, :path => path, :user => username}.to_yaml)
|
212
|
+
# file.close
|
213
|
+
# puts "First time configuration completed!".green
|
214
|
+
# puts "It is strongly recommended to setup a password-less SSH connection to use PipEngine.".green
|
215
|
+
# exit
|
216
|
+
# end
|
217
|
+
# end
|
218
|
+
# end #check_config
|
226
219
|
|
227
220
|
def self.add_job(job, pipeline, step_name, sample)
|
228
221
|
step = Bio::Pipengine::Step.new(step_name,pipeline["steps"][step_name]) # parsing step instructions
|
229
222
|
self.add_job(job, pipeline, step.pre, sample) if step.has_prerequisite?
|
230
223
|
job.add_step(step,sample) # adding step command lines to the job
|
231
224
|
end #add_job
|
225
|
+
|
226
|
+
def self.load_samples_file(file)
|
227
|
+
samples_file = YAML.load_file file
|
228
|
+
samples_file["samples"].each do |k,v|
|
229
|
+
if v.kind_of? Hash
|
230
|
+
samples_file["samples"][k] = Hash[samples_file["samples"][k].map{ |key, value| [key.to_s, value.to_s] }]
|
231
|
+
else
|
232
|
+
samples_file["samples"][k] = v.to_s
|
233
|
+
end
|
234
|
+
end
|
235
|
+
# make sure everything in Samples and Resources is converted to string
|
236
|
+
#samples_file["samples"] = Hash[samples_file["samples"].map{ |key, value| [key.to_s, value.to_s] }]
|
237
|
+
samples_file["resources"] = Hash[samples_file["resources"].map {|k,v| [k.to_s, v.to_s]}]
|
238
|
+
samples_file
|
239
|
+
end
|
240
|
+
|
232
241
|
|
233
242
|
end
|
234
243
|
end
|
data/lib/bio/pipengine/job.rb
CHANGED
@@ -1,8 +1,11 @@
|
|
1
1
|
module Bio
|
2
|
+
|
2
3
|
module Pipengine
|
3
4
|
|
4
5
|
class Job
|
5
6
|
|
7
|
+
@@logger = Logger.new(STDOUT)
|
8
|
+
@@logger_error = Logger.new(STDERR)
|
6
9
|
# a Job object holds information on a job to be submitted
|
7
10
|
# samples_groups and samples_obj are used to store information in case of steps that require to combine info
|
8
11
|
# from multiple samples
|
@@ -112,38 +115,28 @@ module Bio
|
|
112
115
|
|
113
116
|
end
|
114
117
|
|
115
|
-
|
116
|
-
|
117
|
-
|
118
|
-
|
119
|
-
|
120
|
-
|
121
|
-
|
118
|
+
def to_script(options)
|
119
|
+
File.open(self.output+"/"+self.name+'.pbs','w') do |file|
|
120
|
+
file.puts "#!/usr/bin/env bash"
|
121
|
+
file.puts "#PBS -N #{self.name}"
|
122
|
+
file.puts "#PBS -d #{self.output}"
|
123
|
+
file.puts "#PBS -q #{options[:pbs_queue]}"
|
124
|
+
if options[:pbs_opts]
|
125
|
+
file.puts "#PBS -l #{options[:pbs_opts].join(",")}"
|
122
126
|
else
|
123
127
|
l_string = []
|
124
128
|
l_string << "nodes=#{self.nodes}:ppn=#{self.cpus}"
|
125
129
|
l_string << "mem=#{self.mem}" if self.mem
|
126
|
-
|
127
|
-
|
128
|
-
|
129
|
-
torque_job.M = options[:mail_exit]
|
130
|
-
end
|
131
|
-
if options[:mail_start]
|
132
|
-
torque_job.m = "b"
|
133
|
-
torque_job.M = options[:mail_start]
|
134
|
-
end
|
135
|
-
end
|
136
|
-
torque_job.q = options[:pbs_queue] if options[:pbs_queue]
|
137
|
-
torque_job.script = self.command_line.join("\n")+"\n"
|
138
|
-
end
|
139
|
-
end
|
140
|
-
|
141
|
-
def to_script(options)
|
142
|
-
File.open(self.name+'.sh','w') do |file|
|
143
|
-
file.puts "#!/usr/bin/env bash -l"
|
144
|
-
file.puts self.command_line.join("\n")
|
130
|
+
file.puts "#PBS -l #{l_string.join(",")}"
|
131
|
+
end
|
132
|
+
file.puts self.command_line.join("\n")
|
145
133
|
end
|
146
134
|
end
|
135
|
+
|
136
|
+
def submit
|
137
|
+
job_id = `qsub #{self.output}+"/"+#{self.name}.pbs`
|
138
|
+
@@logger.info "#{job_id}".green
|
139
|
+
end
|
147
140
|
|
148
141
|
private
|
149
142
|
|
@@ -181,13 +174,13 @@ module Bio
|
|
181
174
|
|
182
175
|
# for placeholders like <mapping/sample>
|
183
176
|
tmp_cmd.scan(/<(\S+)\/sample>/).map {|e| e.first}.each do |input_folder|
|
184
|
-
|
177
|
+
@@logger.info "Directory #{self.output+"/"+sample.name+"/"+input_folder} not found".magenta unless Dir.exists? self.output+"/"+sample.name+"/"+input_folder
|
185
178
|
tmp_cmd = tmp_cmd.gsub(/<#{input_folder}\/sample>/,self.output+"/"+sample.name+"/"+input_folder+"/"+sample.name)
|
186
179
|
end
|
187
180
|
|
188
181
|
# for placeholders like <mapping/>
|
189
182
|
tmp_cmd.scan(/<(\S+)\/>/).map {|e| e.first}.each do |input_folder|
|
190
|
-
|
183
|
+
@@logger.info "Directory #{self.output+"/"+sample.name+"/"+input_folder} not found".magenta unless Dir.exists? self.output+"/"+sample.name+"/"+input_folder
|
191
184
|
tmp_cmd = tmp_cmd.gsub(/<#{input_folder}\/>/,self.output+"/"+sample.name+"/"+input_folder+"/")
|
192
185
|
end
|
193
186
|
return tmp_cmd
|
data/lib/bio/pipengine/sample.rb
CHANGED
@@ -2,10 +2,31 @@ module Bio
|
|
2
2
|
module Pipengine
|
3
3
|
class Sample
|
4
4
|
# Sample holds all the information on a sample and its original input path (or multiple paths)
|
5
|
-
attr_accessor :path
|
6
|
-
def initialize(name,path_string)
|
5
|
+
attr_accessor :path
|
6
|
+
def initialize(name,path_string,group)
|
7
7
|
@path = path_string.split(",")
|
8
|
-
@name = name
|
8
|
+
@name = name
|
9
|
+
@group = group
|
10
|
+
end
|
11
|
+
|
12
|
+
def name=(name)
|
13
|
+
@name
|
14
|
+
end
|
15
|
+
|
16
|
+
def group=(group)
|
17
|
+
@group
|
18
|
+
end
|
19
|
+
|
20
|
+
def group
|
21
|
+
@group
|
22
|
+
end
|
23
|
+
|
24
|
+
def x_name
|
25
|
+
"#{@group}/#{@name}"
|
26
|
+
end
|
27
|
+
|
28
|
+
def name
|
29
|
+
@name
|
9
30
|
end
|
10
31
|
end
|
11
32
|
end
|
metadata
CHANGED
@@ -1,14 +1,15 @@
|
|
1
1
|
--- !ruby/object:Gem::Specification
|
2
2
|
name: bio-pipengine
|
3
3
|
version: !ruby/object:Gem::Version
|
4
|
-
version: 0.
|
4
|
+
version: 0.8.0
|
5
5
|
platform: ruby
|
6
6
|
authors:
|
7
7
|
- Francesco Strozzi
|
8
|
+
- Raoul Jean Pierre Bonnal
|
8
9
|
autorequire:
|
9
10
|
bindir: bin
|
10
11
|
cert_chain: []
|
11
|
-
date:
|
12
|
+
date: 2016-01-28 00:00:00.000000000 Z
|
12
13
|
dependencies:
|
13
14
|
- !ruby/object:Gem::Dependency
|
14
15
|
name: trollop
|
@@ -39,35 +40,21 @@ dependencies:
|
|
39
40
|
- !ruby/object:Gem::Version
|
40
41
|
version: '0'
|
41
42
|
- !ruby/object:Gem::Dependency
|
42
|
-
name:
|
43
|
+
name: warbler
|
43
44
|
requirement: !ruby/object:Gem::Requirement
|
44
45
|
requirements:
|
45
46
|
- - ">="
|
46
47
|
- !ruby/object:Gem::Version
|
47
|
-
version:
|
48
|
-
type: :runtime
|
49
|
-
prerelease: false
|
50
|
-
version_requirements: !ruby/object:Gem::Requirement
|
51
|
-
requirements:
|
52
|
-
- - ">="
|
53
|
-
- !ruby/object:Gem::Version
|
54
|
-
version: '0'
|
55
|
-
- !ruby/object:Gem::Dependency
|
56
|
-
name: terminal-table
|
57
|
-
requirement: !ruby/object:Gem::Requirement
|
58
|
-
requirements:
|
59
|
-
- - ">="
|
60
|
-
- !ruby/object:Gem::Version
|
61
|
-
version: '0'
|
48
|
+
version: 1.4.8
|
62
49
|
type: :runtime
|
63
50
|
prerelease: false
|
64
51
|
version_requirements: !ruby/object:Gem::Requirement
|
65
52
|
requirements:
|
66
53
|
- - ">="
|
67
54
|
- !ruby/object:Gem::Version
|
68
|
-
version:
|
55
|
+
version: 1.4.8
|
69
56
|
- !ruby/object:Gem::Dependency
|
70
|
-
name:
|
57
|
+
name: jeweler
|
71
58
|
requirement: !ruby/object:Gem::Requirement
|
72
59
|
requirements:
|
73
60
|
- - ">="
|
@@ -80,50 +67,10 @@ dependencies:
|
|
80
67
|
- - ">="
|
81
68
|
- !ruby/object:Gem::Version
|
82
69
|
version: '0'
|
83
|
-
- !ruby/object:Gem::Dependency
|
84
|
-
name: rdoc
|
85
|
-
requirement: !ruby/object:Gem::Requirement
|
86
|
-
requirements:
|
87
|
-
- - "~>"
|
88
|
-
- !ruby/object:Gem::Version
|
89
|
-
version: '3.12'
|
90
|
-
type: :development
|
91
|
-
prerelease: false
|
92
|
-
version_requirements: !ruby/object:Gem::Requirement
|
93
|
-
requirements:
|
94
|
-
- - "~>"
|
95
|
-
- !ruby/object:Gem::Version
|
96
|
-
version: '3.12'
|
97
|
-
- !ruby/object:Gem::Dependency
|
98
|
-
name: bundler
|
99
|
-
requirement: !ruby/object:Gem::Requirement
|
100
|
-
requirements:
|
101
|
-
- - ">"
|
102
|
-
- !ruby/object:Gem::Version
|
103
|
-
version: 1.0.0
|
104
|
-
type: :development
|
105
|
-
prerelease: false
|
106
|
-
version_requirements: !ruby/object:Gem::Requirement
|
107
|
-
requirements:
|
108
|
-
- - ">"
|
109
|
-
- !ruby/object:Gem::Version
|
110
|
-
version: 1.0.0
|
111
|
-
- !ruby/object:Gem::Dependency
|
112
|
-
name: jeweler
|
113
|
-
requirement: !ruby/object:Gem::Requirement
|
114
|
-
requirements:
|
115
|
-
- - "~>"
|
116
|
-
- !ruby/object:Gem::Version
|
117
|
-
version: 1.8.4
|
118
|
-
type: :development
|
119
|
-
prerelease: false
|
120
|
-
version_requirements: !ruby/object:Gem::Requirement
|
121
|
-
requirements:
|
122
|
-
- - "~>"
|
123
|
-
- !ruby/object:Gem::Version
|
124
|
-
version: 1.8.4
|
125
70
|
description: A pipeline manager
|
126
|
-
email:
|
71
|
+
email:
|
72
|
+
- francesco.strozzi@gmail.com
|
73
|
+
- ilpuccio.febo@gmail.com
|
127
74
|
executables:
|
128
75
|
- pipengine
|
129
76
|
extensions: []
|
@@ -160,7 +107,7 @@ required_rubygems_version: !ruby/object:Gem::Requirement
|
|
160
107
|
version: '0'
|
161
108
|
requirements: []
|
162
109
|
rubyforge_project:
|
163
|
-
rubygems_version: 2.
|
110
|
+
rubygems_version: 2.4.5
|
164
111
|
signing_key:
|
165
112
|
specification_version: 4
|
166
113
|
summary: A pipeline manager
|