bio-grid 0.2.5 → 0.2.6
Sign up to get free protection for your applications and to get access to all the features.
- data/Gemfile +2 -1
- data/Gemfile.lock +6 -0
- data/README.md +16 -4
- data/VERSION +1 -1
- data/bin/bio-grid +6 -2
- data/bio-grid.gemspec +7 -1
- data/lib/bio/grid.rb +8 -1
- data/lib/bio/grid/job.rb +10 -8
- data/lib/bioruby-grid.rb +1 -0
- metadata +34 -2
data/Gemfile
CHANGED
@@ -2,7 +2,7 @@ source "http://rubygems.org"
|
|
2
2
|
# Add dependencies required to use your gem here.
|
3
3
|
# Example:
|
4
4
|
# gem "activesupport", ">= 2.3.5"
|
5
|
-
|
5
|
+
gem "uuid"
|
6
6
|
# Add dependencies to develop your gem here.
|
7
7
|
# Include everything needed to run rake, tests, features, etc.
|
8
8
|
group :development do
|
@@ -10,4 +10,5 @@ group :development do
|
|
10
10
|
gem "rdoc", "~> 3.12"
|
11
11
|
gem "bundler", "> 1.0.0"
|
12
12
|
gem "jeweler", "~> 1.8.4"
|
13
|
+
gem "uuid"
|
13
14
|
end
|
data/Gemfile.lock
CHANGED
@@ -9,6 +9,8 @@ GEM
|
|
9
9
|
rake
|
10
10
|
rdoc
|
11
11
|
json (1.7.5)
|
12
|
+
macaddr (1.6.1)
|
13
|
+
systemu (~> 2.5.0)
|
12
14
|
rake (0.9.2.2)
|
13
15
|
rdoc (3.12)
|
14
16
|
json (~> 1.4)
|
@@ -20,6 +22,9 @@ GEM
|
|
20
22
|
rspec-expectations (2.8.0)
|
21
23
|
diff-lcs (~> 1.1.2)
|
22
24
|
rspec-mocks (2.8.0)
|
25
|
+
systemu (2.5.2)
|
26
|
+
uuid (2.3.5)
|
27
|
+
macaddr (~> 1.0)
|
23
28
|
|
24
29
|
PLATFORMS
|
25
30
|
ruby
|
@@ -29,3 +34,4 @@ DEPENDENCIES
|
|
29
34
|
jeweler (~> 1.8.4)
|
30
35
|
rdoc (~> 3.12)
|
31
36
|
rspec (~> 2.8.0)
|
37
|
+
uuid
|
data/README.md
CHANGED
@@ -82,12 +82,13 @@ For a complete list of current BioGrid parameters, type "bio-grid -h":
|
|
82
82
|
-s, --split-number NUMBER Number of input files (or group of files) to use per job. If all the files in a location need to be used for a single job, just specify 'all'
|
83
83
|
-p, --processes PROCESSES Number of processes per job
|
84
84
|
-c, --command-line COMMANDLINE Command line to be executed
|
85
|
-
-o, --output OUTPUT Output folder
|
85
|
+
-o, --output OUTPUT Output folder. Needs a <output> placeholder in the command line
|
86
86
|
-r, --copy-to LOCATION Copy the output once a job is terminated
|
87
87
|
-e, --erease-output Delete job output data when completed (useful to delete output temporary files on a computing node)
|
88
|
+
-a, --params PARAM1,PARAM2... List of parameters to use for testing. Needs a <param> placeholder in the command line
|
88
89
|
-d, --dry Dry run. Just write the job scripts without sending them in queue (for debugging or testing)
|
89
90
|
-t, --test Start the mapping only with the first group of reads (e.g. for testing parameters)
|
90
|
-
-i, --input INPUT1,INPUT2... Location where to find input files (accepts wildcards).
|
91
|
+
-i, --input INPUT1,INPUT2... Location where to find input files (accepts wildcards). Needs <input(1,2,3...> placeholder(s) in the command line
|
91
92
|
--sep SEPARATOR Input file separator [Default: , ]
|
92
93
|
--keep-scripts Keep all the running scripts created for all the jobs
|
93
94
|
-h, --help Display this screen
|
@@ -99,7 +100,7 @@ Advanced stuff
|
|
99
100
|
Ok let's unleash the potential of BioGrid.
|
100
101
|
By putting together an automatic system to generate and submit jobs on a queue systems and a command line template approach, we can do some interesting things.
|
101
102
|
|
102
|
-
|
103
|
+
Numerical parameters sampling and testing
|
103
104
|
-------------------------------
|
104
105
|
|
105
106
|
The tipical scenario is when I have to run a tool on a new dataset and I would like to test different parameters to asses which are the better ones for my analysis.
|
@@ -119,7 +120,18 @@ So in this case, the ```-L``` parameter will take 6 different values: 22, 24, 26
|
|
119
120
|
|
120
121
|
Last but not least, the ```-t``` option is essential so that only a single job per input file (or group of files) will be executed. Sampling parameters values is a typical combinatorial approach and this option avoids generating hundreds of different jobs only to sample a parameter. Coming back to the initial example, if I have 60 pairs of FastQ files, without the ```-t``` option, the job number will be 60x6 = 360, which is just crazy when you only want to test different parameter values.
|
121
122
|
|
122
|
-
|
123
|
+
Others parameters sampling
|
124
|
+
--------------------------
|
125
|
+
|
126
|
+
If you want to sample non-numerical parameters, with BioGrid it is possible to use the ```--params``` option. So for instance, if I want to run Bowtie on my dataset to assess the results differences using the ```--sensitive```, ```--very-sensitive``` and ```--fast``` options, I can do it easely in this way:
|
127
|
+
|
128
|
+
```shell
|
129
|
+
bio-grid -i "/data/Project_X/Sample_Y/*_R1_*.fastq.gz","/data/Project_X/Sample_Y/*_R2_*.fastq.gz" -n bowtie_mapping -c "/software/bowtie2 -x /genomes/genome_index -p 8 <param> -1 <input1> -2 <input2> > <output>.sam" -o /data/Project_X/Sample_Y_mapping -s 1 -p 8 -r /results/Sample_Y_mapping -e --param "--sensitive","--very-sensitive","--fast" -t
|
130
|
+
```
|
131
|
+
|
132
|
+
In this case, the key points are the ```<param>``` placeholder in the command line and the corresponding ```--params``` options in BioGrid, which specify a list of parameters to be used to generate and run different jobs, each one with a different parameter in the list. Again, even in this case, it is recommended to do parameters testing using the ```-t``` option, which only runs a single job and not the full job array.
|
133
|
+
|
134
|
+
So far, BioGrid does not support, for each run, sampling more than one parameter at the same time.
|
123
135
|
|
124
136
|
Contributing to bioruby-grid
|
125
137
|
============================
|
data/VERSION
CHANGED
@@ -1 +1 @@
|
|
1
|
-
0.2.
|
1
|
+
0.2.6
|
data/bin/bio-grid
CHANGED
@@ -26,7 +26,7 @@ optparse = OptionParser.new do |opts|
|
|
26
26
|
options[:cmd] = cmd
|
27
27
|
end
|
28
28
|
|
29
|
-
opts.on("-o","--output OUTPUT","Output folder") do |out|
|
29
|
+
opts.on("-o","--output OUTPUT","Output folder. Needs a <output> placeholder in the command line") do |out|
|
30
30
|
options[:output] = out
|
31
31
|
end
|
32
32
|
|
@@ -38,13 +38,17 @@ optparse = OptionParser.new do |opts|
|
|
38
38
|
options[:clean] = true
|
39
39
|
end
|
40
40
|
|
41
|
+
opts.on("-a","--params PARAM1,PARAM2...",Array,"List of parameters to use for testing. Needs a <param> placeholder in the command line") do |params|
|
42
|
+
options[:params] = params
|
43
|
+
end
|
44
|
+
|
41
45
|
opts.on("-d","--dry","Dry run. Just write the job scripts without sending them in queue (for debugging or testing)") {options[:dry] = true}
|
42
46
|
|
43
47
|
opts.on("-t","--test","Start the mapping only with the first group of reads (e.g. for testing parameters)") do |test|
|
44
48
|
options[:test] = true
|
45
49
|
end
|
46
50
|
|
47
|
-
opts.on("-i","--input INPUT1,INPUT2...",Array,"Location where to find input files (accepts wildcards).
|
51
|
+
opts.on("-i","--input INPUT1,INPUT2...",Array,"Location where to find input files (accepts wildcards). Needs <input(1,2,3...> placeholder(s) in the command line") do |input|
|
48
52
|
options[:input] = input
|
49
53
|
end
|
50
54
|
|
data/bio-grid.gemspec
CHANGED
@@ -5,7 +5,7 @@
|
|
5
5
|
|
6
6
|
Gem::Specification.new do |s|
|
7
7
|
s.name = "bio-grid"
|
8
|
-
s.version = "0.2.
|
8
|
+
s.version = "0.2.6"
|
9
9
|
|
10
10
|
s.required_rubygems_version = Gem::Requirement.new(">= 0") if s.respond_to? :required_rubygems_version=
|
11
11
|
s.authors = ["Francesco Strozzi"]
|
@@ -44,21 +44,27 @@ Gem::Specification.new do |s|
|
|
44
44
|
s.specification_version = 3
|
45
45
|
|
46
46
|
if Gem::Version.new(Gem::VERSION) >= Gem::Version.new('1.2.0') then
|
47
|
+
s.add_runtime_dependency(%q<uuid>, [">= 0"])
|
47
48
|
s.add_development_dependency(%q<rspec>, ["~> 2.8.0"])
|
48
49
|
s.add_development_dependency(%q<rdoc>, ["~> 3.12"])
|
49
50
|
s.add_development_dependency(%q<bundler>, ["> 1.0.0"])
|
50
51
|
s.add_development_dependency(%q<jeweler>, ["~> 1.8.4"])
|
52
|
+
s.add_development_dependency(%q<uuid>, [">= 0"])
|
51
53
|
else
|
54
|
+
s.add_dependency(%q<uuid>, [">= 0"])
|
52
55
|
s.add_dependency(%q<rspec>, ["~> 2.8.0"])
|
53
56
|
s.add_dependency(%q<rdoc>, ["~> 3.12"])
|
54
57
|
s.add_dependency(%q<bundler>, ["> 1.0.0"])
|
55
58
|
s.add_dependency(%q<jeweler>, ["~> 1.8.4"])
|
59
|
+
s.add_dependency(%q<uuid>, [">= 0"])
|
56
60
|
end
|
57
61
|
else
|
62
|
+
s.add_dependency(%q<uuid>, [">= 0"])
|
58
63
|
s.add_dependency(%q<rspec>, ["~> 2.8.0"])
|
59
64
|
s.add_dependency(%q<rdoc>, ["~> 3.12"])
|
60
65
|
s.add_dependency(%q<bundler>, ["> 1.0.0"])
|
61
66
|
s.add_dependency(%q<jeweler>, ["~> 1.8.4"])
|
67
|
+
s.add_dependency(%q<uuid>, [">= 0"])
|
62
68
|
end
|
63
69
|
end
|
64
70
|
|
data/lib/bio/grid.rb
CHANGED
@@ -20,7 +20,14 @@ module Bio
|
|
20
20
|
range.each do |value|
|
21
21
|
cmd_line = options[:cmd].gsub(/<(\d+),(\d+)(,\d+)*>/,value.to_s)
|
22
22
|
job = Bio::Grid::Job.new(options) # inherit global options
|
23
|
-
job.options[:parameter_value] = "-param
|
23
|
+
job.options[:parameter_value] = "-param:#{value}"
|
24
|
+
job.execute(cmd_line,inputs,input1,groups,index)
|
25
|
+
end
|
26
|
+
elsif options[:params]
|
27
|
+
options[:params].each do |p|
|
28
|
+
cmd_line = options[:cmd].gsub(/<param>|<parameter>/,p)
|
29
|
+
job = Bio::Grid::Job.new(options)
|
30
|
+
job.options[:parameter_value] = "-param:#{p}"
|
24
31
|
job.execute(cmd_line,inputs,input1,groups,index)
|
25
32
|
end
|
26
33
|
else
|
data/lib/bio/grid/job.rb
CHANGED
@@ -2,14 +2,16 @@ module Bio
|
|
2
2
|
class Grid
|
3
3
|
class Job
|
4
4
|
|
5
|
-
attr_accessor :options, :instructions, :job_output, :runner
|
5
|
+
attr_accessor :options, :instructions, :job_output, :runner, :uuid
|
6
6
|
def initialize(options)
|
7
7
|
@options = options
|
8
|
-
self.instructions =
|
8
|
+
self.instructions = []
|
9
|
+
self.uuid = UUID.new.generate.split("-").first
|
9
10
|
end
|
10
11
|
|
11
12
|
def set_output_dir
|
12
|
-
self.
|
13
|
+
output_dir = (self.options[:output_folder]) ? "mkdir -p #{self.job_output}\ncd #{self.job_output}\n" : "mkdir -p #{self.options[:output]}\n"
|
14
|
+
self.instructions.insert(1,output_dir)
|
13
15
|
end
|
14
16
|
|
15
17
|
def set_commandline(cmd_line,inputs,input1,groups,index)
|
@@ -20,13 +22,13 @@ module Bio
|
|
20
22
|
job_output = ""
|
21
23
|
if commandline =~/<output>\.(\S+)/
|
22
24
|
extension = $1
|
23
|
-
job_output = self.options[:output]+"/#{
|
25
|
+
job_output = self.options[:output]+"/#{self.uuid}_"+self.options[:name]+"_output_%03d" % (index+1).to_s + "#{self.options[:parameter_value]}"
|
24
26
|
commandline.gsub!(/<output>/,job_output)
|
25
27
|
job_output << ".#{extension}"
|
26
28
|
else
|
27
29
|
self.options[:output_folder] = true
|
28
|
-
|
29
|
-
|
30
|
+
job_output = self.options[:output]+"/#{self.uuid}_"+self.options[:name]
|
31
|
+
commandline.gsub!(/<output>/,job_output)
|
30
32
|
end
|
31
33
|
self.instructions << commandline+"\n"
|
32
34
|
self.job_output = job_output
|
@@ -48,7 +50,7 @@ module Bio
|
|
48
50
|
def write_runner(filename)
|
49
51
|
self.runner = filename
|
50
52
|
out = File.open(Dir.pwd+"/"+filename,"w")
|
51
|
-
out.write(self.instructions+"\n")
|
53
|
+
out.write(self.instructions.join+"\n")
|
52
54
|
out.close
|
53
55
|
end
|
54
56
|
|
@@ -63,8 +65,8 @@ module Bio
|
|
63
65
|
|
64
66
|
def execute(command_line,inputs,input1,groups,index)
|
65
67
|
self.set_scheduler_options(:pbs) # set script specific options for the scheduling system
|
66
|
-
self.set_output_dir
|
67
68
|
self.set_commandline(command_line,inputs,input1,groups,index)
|
69
|
+
self.set_output_dir
|
68
70
|
self.append_options
|
69
71
|
job_filename = (self.options[:keep]) ? "job_#{index+1}#{self.options[:parameter_value]}.sh" : "job.sh"
|
70
72
|
self.run(job_filename)
|
data/lib/bioruby-grid.rb
CHANGED
metadata
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
--- !ruby/object:Gem::Specification
|
2
2
|
name: bio-grid
|
3
3
|
version: !ruby/object:Gem::Version
|
4
|
-
version: 0.2.
|
4
|
+
version: 0.2.6
|
5
5
|
prerelease:
|
6
6
|
platform: ruby
|
7
7
|
authors:
|
@@ -11,6 +11,22 @@ bindir: bin
|
|
11
11
|
cert_chain: []
|
12
12
|
date: 2012-09-24 00:00:00.000000000 Z
|
13
13
|
dependencies:
|
14
|
+
- !ruby/object:Gem::Dependency
|
15
|
+
name: uuid
|
16
|
+
requirement: !ruby/object:Gem::Requirement
|
17
|
+
none: false
|
18
|
+
requirements:
|
19
|
+
- - ! '>='
|
20
|
+
- !ruby/object:Gem::Version
|
21
|
+
version: '0'
|
22
|
+
type: :runtime
|
23
|
+
prerelease: false
|
24
|
+
version_requirements: !ruby/object:Gem::Requirement
|
25
|
+
none: false
|
26
|
+
requirements:
|
27
|
+
- - ! '>='
|
28
|
+
- !ruby/object:Gem::Version
|
29
|
+
version: '0'
|
14
30
|
- !ruby/object:Gem::Dependency
|
15
31
|
name: rspec
|
16
32
|
requirement: !ruby/object:Gem::Requirement
|
@@ -75,6 +91,22 @@ dependencies:
|
|
75
91
|
- - ~>
|
76
92
|
- !ruby/object:Gem::Version
|
77
93
|
version: 1.8.4
|
94
|
+
- !ruby/object:Gem::Dependency
|
95
|
+
name: uuid
|
96
|
+
requirement: !ruby/object:Gem::Requirement
|
97
|
+
none: false
|
98
|
+
requirements:
|
99
|
+
- - ! '>='
|
100
|
+
- !ruby/object:Gem::Version
|
101
|
+
version: '0'
|
102
|
+
type: :development
|
103
|
+
prerelease: false
|
104
|
+
version_requirements: !ruby/object:Gem::Requirement
|
105
|
+
none: false
|
106
|
+
requirements:
|
107
|
+
- - ! '>='
|
108
|
+
- !ruby/object:Gem::Version
|
109
|
+
version: '0'
|
78
110
|
description: A BioGem to submit jobs on a queue system
|
79
111
|
email: francesco.strozzi@gmail.com
|
80
112
|
executables:
|
@@ -114,7 +146,7 @@ required_ruby_version: !ruby/object:Gem::Requirement
|
|
114
146
|
version: '0'
|
115
147
|
segments:
|
116
148
|
- 0
|
117
|
-
hash:
|
149
|
+
hash: -4333638046284412790
|
118
150
|
required_rubygems_version: !ruby/object:Gem::Requirement
|
119
151
|
none: false
|
120
152
|
requirements:
|