bio-grid 0.2.5 → 0.2.6
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- data/Gemfile +2 -1
- data/Gemfile.lock +6 -0
- data/README.md +16 -4
- data/VERSION +1 -1
- data/bin/bio-grid +6 -2
- data/bio-grid.gemspec +7 -1
- data/lib/bio/grid.rb +8 -1
- data/lib/bio/grid/job.rb +10 -8
- data/lib/bioruby-grid.rb +1 -0
- metadata +34 -2
data/Gemfile
CHANGED
@@ -2,7 +2,7 @@ source "http://rubygems.org"
|
|
2
2
|
# Add dependencies required to use your gem here.
|
3
3
|
# Example:
|
4
4
|
# gem "activesupport", ">= 2.3.5"
|
5
|
-
|
5
|
+
gem "uuid"
|
6
6
|
# Add dependencies to develop your gem here.
|
7
7
|
# Include everything needed to run rake, tests, features, etc.
|
8
8
|
group :development do
|
@@ -10,4 +10,5 @@ group :development do
|
|
10
10
|
gem "rdoc", "~> 3.12"
|
11
11
|
gem "bundler", "> 1.0.0"
|
12
12
|
gem "jeweler", "~> 1.8.4"
|
13
|
+
gem "uuid"
|
13
14
|
end
|
data/Gemfile.lock
CHANGED
@@ -9,6 +9,8 @@ GEM
|
|
9
9
|
rake
|
10
10
|
rdoc
|
11
11
|
json (1.7.5)
|
12
|
+
macaddr (1.6.1)
|
13
|
+
systemu (~> 2.5.0)
|
12
14
|
rake (0.9.2.2)
|
13
15
|
rdoc (3.12)
|
14
16
|
json (~> 1.4)
|
@@ -20,6 +22,9 @@ GEM
|
|
20
22
|
rspec-expectations (2.8.0)
|
21
23
|
diff-lcs (~> 1.1.2)
|
22
24
|
rspec-mocks (2.8.0)
|
25
|
+
systemu (2.5.2)
|
26
|
+
uuid (2.3.5)
|
27
|
+
macaddr (~> 1.0)
|
23
28
|
|
24
29
|
PLATFORMS
|
25
30
|
ruby
|
@@ -29,3 +34,4 @@ DEPENDENCIES
|
|
29
34
|
jeweler (~> 1.8.4)
|
30
35
|
rdoc (~> 3.12)
|
31
36
|
rspec (~> 2.8.0)
|
37
|
+
uuid
|
data/README.md
CHANGED
@@ -82,12 +82,13 @@ For a complete list of current BioGrid parameters, type "bio-grid -h":
|
|
82
82
|
-s, --split-number NUMBER Number of input files (or group of files) to use per job. If all the files in a location need to be used for a single job, just specify 'all'
|
83
83
|
-p, --processes PROCESSES Number of processes per job
|
84
84
|
-c, --command-line COMMANDLINE Command line to be executed
|
85
|
-
-o, --output OUTPUT Output folder
|
85
|
+
-o, --output OUTPUT Output folder. Needs a <output> placeholder in the command line
|
86
86
|
-r, --copy-to LOCATION Copy the output once a job is terminated
|
87
87
|
-e, --erease-output Delete job output data when completed (useful to delete output temporary files on a computing node)
|
88
|
+
-a, --params PARAM1,PARAM2... List of parameters to use for testing. Needs a <param> placeholder in the command line
|
88
89
|
-d, --dry Dry run. Just write the job scripts without sending them in queue (for debugging or testing)
|
89
90
|
-t, --test Start the mapping only with the first group of reads (e.g. for testing parameters)
|
90
|
-
-i, --input INPUT1,INPUT2... Location where to find input files (accepts wildcards).
|
91
|
+
-i, --input INPUT1,INPUT2... Location where to find input files (accepts wildcards). Needs <input(1,2,3...> placeholder(s) in the command line
|
91
92
|
--sep SEPARATOR Input file separator [Default: , ]
|
92
93
|
--keep-scripts Keep all the running scripts created for all the jobs
|
93
94
|
-h, --help Display this screen
|
@@ -99,7 +100,7 @@ Advanced stuff
|
|
99
100
|
Ok let's unleash the potential of BioGrid.
|
100
101
|
By putting together an automatic system to generate and submit jobs on a queue systems and a command line template approach, we can do some interesting things.
|
101
102
|
|
102
|
-
|
103
|
+
Numerical parameters sampling and testing
|
103
104
|
-------------------------------
|
104
105
|
|
105
106
|
The tipical scenario is when I have to run a tool on a new dataset and I would like to test different parameters to asses which are the better ones for my analysis.
|
@@ -119,7 +120,18 @@ So in this case, the ```-L``` parameter will take 6 different values: 22, 24, 26
|
|
119
120
|
|
120
121
|
Last but not least, the ```-t``` option is essential so that only a single job per input file (or group of files) will be executed. Sampling parameters values is a typical combinatorial approach and this option avoids generating hundreds of different jobs only to sample a parameter. Coming back to the initial example, if I have 60 pairs of FastQ files, without the ```-t``` option, the job number will be 60x6 = 360, which is just crazy when you only want to test different parameter values.
|
121
122
|
|
122
|
-
|
123
|
+
Others parameters sampling
|
124
|
+
--------------------------
|
125
|
+
|
126
|
+
If you want to sample non-numerical parameters, with BioGrid it is possible to use the ```--params``` option. So for instance, if I want to run Bowtie on my dataset to assess the results differences using the ```--sensitive```, ```--very-sensitive``` and ```--fast``` options, I can do it easely in this way:
|
127
|
+
|
128
|
+
```shell
|
129
|
+
bio-grid -i "/data/Project_X/Sample_Y/*_R1_*.fastq.gz","/data/Project_X/Sample_Y/*_R2_*.fastq.gz" -n bowtie_mapping -c "/software/bowtie2 -x /genomes/genome_index -p 8 <param> -1 <input1> -2 <input2> > <output>.sam" -o /data/Project_X/Sample_Y_mapping -s 1 -p 8 -r /results/Sample_Y_mapping -e --param "--sensitive","--very-sensitive","--fast" -t
|
130
|
+
```
|
131
|
+
|
132
|
+
In this case, the key points are the ```<param>``` placeholder in the command line and the corresponding ```--params``` options in BioGrid, which specify a list of parameters to be used to generate and run different jobs, each one with a different parameter in the list. Again, even in this case, it is recommended to do parameters testing using the ```-t``` option, which only runs a single job and not the full job array.
|
133
|
+
|
134
|
+
So far, BioGrid does not support, for each run, sampling more than one parameter at the same time.
|
123
135
|
|
124
136
|
Contributing to bioruby-grid
|
125
137
|
============================
|
data/VERSION
CHANGED
@@ -1 +1 @@
|
|
1
|
-
0.2.
|
1
|
+
0.2.6
|
data/bin/bio-grid
CHANGED
@@ -26,7 +26,7 @@ optparse = OptionParser.new do |opts|
|
|
26
26
|
options[:cmd] = cmd
|
27
27
|
end
|
28
28
|
|
29
|
-
opts.on("-o","--output OUTPUT","Output folder") do |out|
|
29
|
+
opts.on("-o","--output OUTPUT","Output folder. Needs a <output> placeholder in the command line") do |out|
|
30
30
|
options[:output] = out
|
31
31
|
end
|
32
32
|
|
@@ -38,13 +38,17 @@ optparse = OptionParser.new do |opts|
|
|
38
38
|
options[:clean] = true
|
39
39
|
end
|
40
40
|
|
41
|
+
opts.on("-a","--params PARAM1,PARAM2...",Array,"List of parameters to use for testing. Needs a <param> placeholder in the command line") do |params|
|
42
|
+
options[:params] = params
|
43
|
+
end
|
44
|
+
|
41
45
|
opts.on("-d","--dry","Dry run. Just write the job scripts without sending them in queue (for debugging or testing)") {options[:dry] = true}
|
42
46
|
|
43
47
|
opts.on("-t","--test","Start the mapping only with the first group of reads (e.g. for testing parameters)") do |test|
|
44
48
|
options[:test] = true
|
45
49
|
end
|
46
50
|
|
47
|
-
opts.on("-i","--input INPUT1,INPUT2...",Array,"Location where to find input files (accepts wildcards).
|
51
|
+
opts.on("-i","--input INPUT1,INPUT2...",Array,"Location where to find input files (accepts wildcards). Needs <input(1,2,3...> placeholder(s) in the command line") do |input|
|
48
52
|
options[:input] = input
|
49
53
|
end
|
50
54
|
|
data/bio-grid.gemspec
CHANGED
@@ -5,7 +5,7 @@
|
|
5
5
|
|
6
6
|
Gem::Specification.new do |s|
|
7
7
|
s.name = "bio-grid"
|
8
|
-
s.version = "0.2.
|
8
|
+
s.version = "0.2.6"
|
9
9
|
|
10
10
|
s.required_rubygems_version = Gem::Requirement.new(">= 0") if s.respond_to? :required_rubygems_version=
|
11
11
|
s.authors = ["Francesco Strozzi"]
|
@@ -44,21 +44,27 @@ Gem::Specification.new do |s|
|
|
44
44
|
s.specification_version = 3
|
45
45
|
|
46
46
|
if Gem::Version.new(Gem::VERSION) >= Gem::Version.new('1.2.0') then
|
47
|
+
s.add_runtime_dependency(%q<uuid>, [">= 0"])
|
47
48
|
s.add_development_dependency(%q<rspec>, ["~> 2.8.0"])
|
48
49
|
s.add_development_dependency(%q<rdoc>, ["~> 3.12"])
|
49
50
|
s.add_development_dependency(%q<bundler>, ["> 1.0.0"])
|
50
51
|
s.add_development_dependency(%q<jeweler>, ["~> 1.8.4"])
|
52
|
+
s.add_development_dependency(%q<uuid>, [">= 0"])
|
51
53
|
else
|
54
|
+
s.add_dependency(%q<uuid>, [">= 0"])
|
52
55
|
s.add_dependency(%q<rspec>, ["~> 2.8.0"])
|
53
56
|
s.add_dependency(%q<rdoc>, ["~> 3.12"])
|
54
57
|
s.add_dependency(%q<bundler>, ["> 1.0.0"])
|
55
58
|
s.add_dependency(%q<jeweler>, ["~> 1.8.4"])
|
59
|
+
s.add_dependency(%q<uuid>, [">= 0"])
|
56
60
|
end
|
57
61
|
else
|
62
|
+
s.add_dependency(%q<uuid>, [">= 0"])
|
58
63
|
s.add_dependency(%q<rspec>, ["~> 2.8.0"])
|
59
64
|
s.add_dependency(%q<rdoc>, ["~> 3.12"])
|
60
65
|
s.add_dependency(%q<bundler>, ["> 1.0.0"])
|
61
66
|
s.add_dependency(%q<jeweler>, ["~> 1.8.4"])
|
67
|
+
s.add_dependency(%q<uuid>, [">= 0"])
|
62
68
|
end
|
63
69
|
end
|
64
70
|
|
data/lib/bio/grid.rb
CHANGED
@@ -20,7 +20,14 @@ module Bio
|
|
20
20
|
range.each do |value|
|
21
21
|
cmd_line = options[:cmd].gsub(/<(\d+),(\d+)(,\d+)*>/,value.to_s)
|
22
22
|
job = Bio::Grid::Job.new(options) # inherit global options
|
23
|
-
job.options[:parameter_value] = "-param
|
23
|
+
job.options[:parameter_value] = "-param:#{value}"
|
24
|
+
job.execute(cmd_line,inputs,input1,groups,index)
|
25
|
+
end
|
26
|
+
elsif options[:params]
|
27
|
+
options[:params].each do |p|
|
28
|
+
cmd_line = options[:cmd].gsub(/<param>|<parameter>/,p)
|
29
|
+
job = Bio::Grid::Job.new(options)
|
30
|
+
job.options[:parameter_value] = "-param:#{p}"
|
24
31
|
job.execute(cmd_line,inputs,input1,groups,index)
|
25
32
|
end
|
26
33
|
else
|
data/lib/bio/grid/job.rb
CHANGED
@@ -2,14 +2,16 @@ module Bio
|
|
2
2
|
class Grid
|
3
3
|
class Job
|
4
4
|
|
5
|
-
attr_accessor :options, :instructions, :job_output, :runner
|
5
|
+
attr_accessor :options, :instructions, :job_output, :runner, :uuid
|
6
6
|
def initialize(options)
|
7
7
|
@options = options
|
8
|
-
self.instructions =
|
8
|
+
self.instructions = []
|
9
|
+
self.uuid = UUID.new.generate.split("-").first
|
9
10
|
end
|
10
11
|
|
11
12
|
def set_output_dir
|
12
|
-
self.
|
13
|
+
output_dir = (self.options[:output_folder]) ? "mkdir -p #{self.job_output}\ncd #{self.job_output}\n" : "mkdir -p #{self.options[:output]}\n"
|
14
|
+
self.instructions.insert(1,output_dir)
|
13
15
|
end
|
14
16
|
|
15
17
|
def set_commandline(cmd_line,inputs,input1,groups,index)
|
@@ -20,13 +22,13 @@ module Bio
|
|
20
22
|
job_output = ""
|
21
23
|
if commandline =~/<output>\.(\S+)/
|
22
24
|
extension = $1
|
23
|
-
job_output = self.options[:output]+"/#{
|
25
|
+
job_output = self.options[:output]+"/#{self.uuid}_"+self.options[:name]+"_output_%03d" % (index+1).to_s + "#{self.options[:parameter_value]}"
|
24
26
|
commandline.gsub!(/<output>/,job_output)
|
25
27
|
job_output << ".#{extension}"
|
26
28
|
else
|
27
29
|
self.options[:output_folder] = true
|
28
|
-
|
29
|
-
|
30
|
+
job_output = self.options[:output]+"/#{self.uuid}_"+self.options[:name]
|
31
|
+
commandline.gsub!(/<output>/,job_output)
|
30
32
|
end
|
31
33
|
self.instructions << commandline+"\n"
|
32
34
|
self.job_output = job_output
|
@@ -48,7 +50,7 @@ module Bio
|
|
48
50
|
def write_runner(filename)
|
49
51
|
self.runner = filename
|
50
52
|
out = File.open(Dir.pwd+"/"+filename,"w")
|
51
|
-
out.write(self.instructions+"\n")
|
53
|
+
out.write(self.instructions.join+"\n")
|
52
54
|
out.close
|
53
55
|
end
|
54
56
|
|
@@ -63,8 +65,8 @@ module Bio
|
|
63
65
|
|
64
66
|
def execute(command_line,inputs,input1,groups,index)
|
65
67
|
self.set_scheduler_options(:pbs) # set script specific options for the scheduling system
|
66
|
-
self.set_output_dir
|
67
68
|
self.set_commandline(command_line,inputs,input1,groups,index)
|
69
|
+
self.set_output_dir
|
68
70
|
self.append_options
|
69
71
|
job_filename = (self.options[:keep]) ? "job_#{index+1}#{self.options[:parameter_value]}.sh" : "job.sh"
|
70
72
|
self.run(job_filename)
|
data/lib/bioruby-grid.rb
CHANGED
metadata
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
--- !ruby/object:Gem::Specification
|
2
2
|
name: bio-grid
|
3
3
|
version: !ruby/object:Gem::Version
|
4
|
-
version: 0.2.
|
4
|
+
version: 0.2.6
|
5
5
|
prerelease:
|
6
6
|
platform: ruby
|
7
7
|
authors:
|
@@ -11,6 +11,22 @@ bindir: bin
|
|
11
11
|
cert_chain: []
|
12
12
|
date: 2012-09-24 00:00:00.000000000 Z
|
13
13
|
dependencies:
|
14
|
+
- !ruby/object:Gem::Dependency
|
15
|
+
name: uuid
|
16
|
+
requirement: !ruby/object:Gem::Requirement
|
17
|
+
none: false
|
18
|
+
requirements:
|
19
|
+
- - ! '>='
|
20
|
+
- !ruby/object:Gem::Version
|
21
|
+
version: '0'
|
22
|
+
type: :runtime
|
23
|
+
prerelease: false
|
24
|
+
version_requirements: !ruby/object:Gem::Requirement
|
25
|
+
none: false
|
26
|
+
requirements:
|
27
|
+
- - ! '>='
|
28
|
+
- !ruby/object:Gem::Version
|
29
|
+
version: '0'
|
14
30
|
- !ruby/object:Gem::Dependency
|
15
31
|
name: rspec
|
16
32
|
requirement: !ruby/object:Gem::Requirement
|
@@ -75,6 +91,22 @@ dependencies:
|
|
75
91
|
- - ~>
|
76
92
|
- !ruby/object:Gem::Version
|
77
93
|
version: 1.8.4
|
94
|
+
- !ruby/object:Gem::Dependency
|
95
|
+
name: uuid
|
96
|
+
requirement: !ruby/object:Gem::Requirement
|
97
|
+
none: false
|
98
|
+
requirements:
|
99
|
+
- - ! '>='
|
100
|
+
- !ruby/object:Gem::Version
|
101
|
+
version: '0'
|
102
|
+
type: :development
|
103
|
+
prerelease: false
|
104
|
+
version_requirements: !ruby/object:Gem::Requirement
|
105
|
+
none: false
|
106
|
+
requirements:
|
107
|
+
- - ! '>='
|
108
|
+
- !ruby/object:Gem::Version
|
109
|
+
version: '0'
|
78
110
|
description: A BioGem to submit jobs on a queue system
|
79
111
|
email: francesco.strozzi@gmail.com
|
80
112
|
executables:
|
@@ -114,7 +146,7 @@ required_ruby_version: !ruby/object:Gem::Requirement
|
|
114
146
|
version: '0'
|
115
147
|
segments:
|
116
148
|
- 0
|
117
|
-
hash:
|
149
|
+
hash: -4333638046284412790
|
118
150
|
required_rubygems_version: !ruby/object:Gem::Requirement
|
119
151
|
none: false
|
120
152
|
requirements:
|