bio-grid 0.2.5 → 0.2.6

Sign up to get free protection for your applications and to get access to all the features.
data/Gemfile CHANGED
@@ -2,7 +2,7 @@ source "http://rubygems.org"
2
2
  # Add dependencies required to use your gem here.
3
3
  # Example:
4
4
  # gem "activesupport", ">= 2.3.5"
5
-
5
+ gem "uuid"
6
6
  # Add dependencies to develop your gem here.
7
7
  # Include everything needed to run rake, tests, features, etc.
8
8
  group :development do
@@ -10,4 +10,5 @@ group :development do
10
10
  gem "rdoc", "~> 3.12"
11
11
  gem "bundler", "> 1.0.0"
12
12
  gem "jeweler", "~> 1.8.4"
13
+ gem "uuid"
13
14
  end
@@ -9,6 +9,8 @@ GEM
9
9
  rake
10
10
  rdoc
11
11
  json (1.7.5)
12
+ macaddr (1.6.1)
13
+ systemu (~> 2.5.0)
12
14
  rake (0.9.2.2)
13
15
  rdoc (3.12)
14
16
  json (~> 1.4)
@@ -20,6 +22,9 @@ GEM
20
22
  rspec-expectations (2.8.0)
21
23
  diff-lcs (~> 1.1.2)
22
24
  rspec-mocks (2.8.0)
25
+ systemu (2.5.2)
26
+ uuid (2.3.5)
27
+ macaddr (~> 1.0)
23
28
 
24
29
  PLATFORMS
25
30
  ruby
@@ -29,3 +34,4 @@ DEPENDENCIES
29
34
  jeweler (~> 1.8.4)
30
35
  rdoc (~> 3.12)
31
36
  rspec (~> 2.8.0)
37
+ uuid
data/README.md CHANGED
@@ -82,12 +82,13 @@ For a complete list of current BioGrid parameters, type "bio-grid -h":
82
82
  -s, --split-number NUMBER Number of input files (or group of files) to use per job. If all the files in a location need to be used for a single job, just specify 'all'
83
83
  -p, --processes PROCESSES Number of processes per job
84
84
  -c, --command-line COMMANDLINE Command line to be executed
85
- -o, --output OUTPUT Output folder
85
+ -o, --output OUTPUT Output folder. Needs a <output> placeholder in the command line
86
86
  -r, --copy-to LOCATION Copy the output once a job is terminated
87
87
  -e, --erease-output Delete job output data when completed (useful to delete output temporary files on a computing node)
88
+ -a, --params PARAM1,PARAM2... List of parameters to use for testing. Needs a <param> placeholder in the command line
88
89
  -d, --dry Dry run. Just write the job scripts without sending them in queue (for debugging or testing)
89
90
  -t, --test Start the mapping only with the first group of reads (e.g. for testing parameters)
90
- -i, --input INPUT1,INPUT2... Location where to find input files (accepts wildcards). You can specify more than one input location, just provide a comma separated list
91
+ -i, --input INPUT1,INPUT2... Location where to find input files (accepts wildcards). Needs <input(1,2,3...> placeholder(s) in the command line
91
92
  --sep SEPARATOR Input file separator [Default: , ]
92
93
  --keep-scripts Keep all the running scripts created for all the jobs
93
94
  -h, --help Display this screen
@@ -99,7 +100,7 @@ Advanced stuff
99
100
  Ok let's unleash the potential of BioGrid.
100
101
  By putting together an automatic system to generate and submit jobs on a queue systems and a command line template approach, we can do some interesting things.
101
102
 
102
- Parameters sampling and testing
103
+ Numerical parameters sampling and testing
103
104
  -------------------------------
104
105
 
105
106
  The tipical scenario is when I have to run a tool on a new dataset and I would like to test different parameters to asses which are the better ones for my analysis.
@@ -119,7 +120,18 @@ So in this case, the ```-L``` parameter will take 6 different values: 22, 24, 26
119
120
 
120
121
  Last but not least, the ```-t``` option is essential so that only a single job per input file (or group of files) will be executed. Sampling parameters values is a typical combinatorial approach and this option avoids generating hundreds of different jobs only to sample a parameter. Coming back to the initial example, if I have 60 pairs of FastQ files, without the ```-t``` option, the job number will be 60x6 = 360, which is just crazy when you only want to test different parameter values.
121
122
 
122
- So far, BioGrid does not support sampling more than one parameter at the same time.
123
+ Others parameters sampling
124
+ --------------------------
125
+
126
+ If you want to sample non-numerical parameters, with BioGrid it is possible to use the ```--params``` option. So for instance, if I want to run Bowtie on my dataset to assess the results differences using the ```--sensitive```, ```--very-sensitive``` and ```--fast``` options, I can do it easely in this way:
127
+
128
+ ```shell
129
+ bio-grid -i "/data/Project_X/Sample_Y/*_R1_*.fastq.gz","/data/Project_X/Sample_Y/*_R2_*.fastq.gz" -n bowtie_mapping -c "/software/bowtie2 -x /genomes/genome_index -p 8 <param> -1 <input1> -2 <input2> > <output>.sam" -o /data/Project_X/Sample_Y_mapping -s 1 -p 8 -r /results/Sample_Y_mapping -e --param "--sensitive","--very-sensitive","--fast" -t
130
+ ```
131
+
132
+ In this case, the key points are the ```<param>``` placeholder in the command line and the corresponding ```--params``` options in BioGrid, which specify a list of parameters to be used to generate and run different jobs, each one with a different parameter in the list. Again, even in this case, it is recommended to do parameters testing using the ```-t``` option, which only runs a single job and not the full job array.
133
+
134
+ So far, BioGrid does not support, for each run, sampling more than one parameter at the same time.
123
135
 
124
136
  Contributing to bioruby-grid
125
137
  ============================
data/VERSION CHANGED
@@ -1 +1 @@
1
- 0.2.5
1
+ 0.2.6
@@ -26,7 +26,7 @@ optparse = OptionParser.new do |opts|
26
26
  options[:cmd] = cmd
27
27
  end
28
28
 
29
- opts.on("-o","--output OUTPUT","Output folder") do |out|
29
+ opts.on("-o","--output OUTPUT","Output folder. Needs a <output> placeholder in the command line") do |out|
30
30
  options[:output] = out
31
31
  end
32
32
 
@@ -38,13 +38,17 @@ optparse = OptionParser.new do |opts|
38
38
  options[:clean] = true
39
39
  end
40
40
 
41
+ opts.on("-a","--params PARAM1,PARAM2...",Array,"List of parameters to use for testing. Needs a <param> placeholder in the command line") do |params|
42
+ options[:params] = params
43
+ end
44
+
41
45
  opts.on("-d","--dry","Dry run. Just write the job scripts without sending them in queue (for debugging or testing)") {options[:dry] = true}
42
46
 
43
47
  opts.on("-t","--test","Start the mapping only with the first group of reads (e.g. for testing parameters)") do |test|
44
48
  options[:test] = true
45
49
  end
46
50
 
47
- opts.on("-i","--input INPUT1,INPUT2...",Array,"Location where to find input files (accepts wildcards). You can specify more than one input location, just provide a comma separated list") do |input|
51
+ opts.on("-i","--input INPUT1,INPUT2...",Array,"Location where to find input files (accepts wildcards). Needs <input(1,2,3...> placeholder(s) in the command line") do |input|
48
52
  options[:input] = input
49
53
  end
50
54
 
@@ -5,7 +5,7 @@
5
5
 
6
6
  Gem::Specification.new do |s|
7
7
  s.name = "bio-grid"
8
- s.version = "0.2.5"
8
+ s.version = "0.2.6"
9
9
 
10
10
  s.required_rubygems_version = Gem::Requirement.new(">= 0") if s.respond_to? :required_rubygems_version=
11
11
  s.authors = ["Francesco Strozzi"]
@@ -44,21 +44,27 @@ Gem::Specification.new do |s|
44
44
  s.specification_version = 3
45
45
 
46
46
  if Gem::Version.new(Gem::VERSION) >= Gem::Version.new('1.2.0') then
47
+ s.add_runtime_dependency(%q<uuid>, [">= 0"])
47
48
  s.add_development_dependency(%q<rspec>, ["~> 2.8.0"])
48
49
  s.add_development_dependency(%q<rdoc>, ["~> 3.12"])
49
50
  s.add_development_dependency(%q<bundler>, ["> 1.0.0"])
50
51
  s.add_development_dependency(%q<jeweler>, ["~> 1.8.4"])
52
+ s.add_development_dependency(%q<uuid>, [">= 0"])
51
53
  else
54
+ s.add_dependency(%q<uuid>, [">= 0"])
52
55
  s.add_dependency(%q<rspec>, ["~> 2.8.0"])
53
56
  s.add_dependency(%q<rdoc>, ["~> 3.12"])
54
57
  s.add_dependency(%q<bundler>, ["> 1.0.0"])
55
58
  s.add_dependency(%q<jeweler>, ["~> 1.8.4"])
59
+ s.add_dependency(%q<uuid>, [">= 0"])
56
60
  end
57
61
  else
62
+ s.add_dependency(%q<uuid>, [">= 0"])
58
63
  s.add_dependency(%q<rspec>, ["~> 2.8.0"])
59
64
  s.add_dependency(%q<rdoc>, ["~> 3.12"])
60
65
  s.add_dependency(%q<bundler>, ["> 1.0.0"])
61
66
  s.add_dependency(%q<jeweler>, ["~> 1.8.4"])
67
+ s.add_dependency(%q<uuid>, [">= 0"])
62
68
  end
63
69
  end
64
70
 
@@ -20,7 +20,14 @@ module Bio
20
20
  range.each do |value|
21
21
  cmd_line = options[:cmd].gsub(/<(\d+),(\d+)(,\d+)*>/,value.to_s)
22
22
  job = Bio::Grid::Job.new(options) # inherit global options
23
- job.options[:parameter_value] = "-param-#{value}"
23
+ job.options[:parameter_value] = "-param:#{value}"
24
+ job.execute(cmd_line,inputs,input1,groups,index)
25
+ end
26
+ elsif options[:params]
27
+ options[:params].each do |p|
28
+ cmd_line = options[:cmd].gsub(/<param>|<parameter>/,p)
29
+ job = Bio::Grid::Job.new(options)
30
+ job.options[:parameter_value] = "-param:#{p}"
24
31
  job.execute(cmd_line,inputs,input1,groups,index)
25
32
  end
26
33
  else
@@ -2,14 +2,16 @@ module Bio
2
2
  class Grid
3
3
  class Job
4
4
 
5
- attr_accessor :options, :instructions, :job_output, :runner
5
+ attr_accessor :options, :instructions, :job_output, :runner, :uuid
6
6
  def initialize(options)
7
7
  @options = options
8
- self.instructions = ""
8
+ self.instructions = []
9
+ self.uuid = UUID.new.generate.split("-").first
9
10
  end
10
11
 
11
12
  def set_output_dir
12
- self.instructions << ("mkdir -p #{self.options[:output]}\n")
13
+ output_dir = (self.options[:output_folder]) ? "mkdir -p #{self.job_output}\ncd #{self.job_output}\n" : "mkdir -p #{self.options[:output]}\n"
14
+ self.instructions.insert(1,output_dir)
13
15
  end
14
16
 
15
17
  def set_commandline(cmd_line,inputs,input1,groups,index)
@@ -20,13 +22,13 @@ module Bio
20
22
  job_output = ""
21
23
  if commandline =~/<output>\.(\S+)/
22
24
  extension = $1
23
- job_output = self.options[:output]+"/#{Time.now.to_i}"+self.options[:name]+"_output_%03d" % (index+1).to_s + "#{self.options[:parameter_value]}"
25
+ job_output = self.options[:output]+"/#{self.uuid}_"+self.options[:name]+"_output_%03d" % (index+1).to_s + "#{self.options[:parameter_value]}"
24
26
  commandline.gsub!(/<output>/,job_output)
25
27
  job_output << ".#{extension}"
26
28
  else
27
29
  self.options[:output_folder] = true
28
- commandline.gsub!(/<output>/,self.options[:output]+"/#{Time.now.to_i}_"+self.options[:name])
29
- job_output = self.options[:output]+"/#{Time.now.to_i}_"+self.options[:name]
30
+ job_output = self.options[:output]+"/#{self.uuid}_"+self.options[:name]
31
+ commandline.gsub!(/<output>/,job_output)
30
32
  end
31
33
  self.instructions << commandline+"\n"
32
34
  self.job_output = job_output
@@ -48,7 +50,7 @@ module Bio
48
50
  def write_runner(filename)
49
51
  self.runner = filename
50
52
  out = File.open(Dir.pwd+"/"+filename,"w")
51
- out.write(self.instructions+"\n")
53
+ out.write(self.instructions.join+"\n")
52
54
  out.close
53
55
  end
54
56
 
@@ -63,8 +65,8 @@ module Bio
63
65
 
64
66
  def execute(command_line,inputs,input1,groups,index)
65
67
  self.set_scheduler_options(:pbs) # set script specific options for the scheduling system
66
- self.set_output_dir
67
68
  self.set_commandline(command_line,inputs,input1,groups,index)
69
+ self.set_output_dir
68
70
  self.append_options
69
71
  job_filename = (self.options[:keep]) ? "job_#{index+1}#{self.options[:parameter_value]}.sh" : "job.sh"
70
72
  self.run(job_filename)
@@ -1,2 +1,3 @@
1
+ require 'uuid'
1
2
  require 'bio/grid'
2
3
  require 'bio/grid/job'
metadata CHANGED
@@ -1,7 +1,7 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: bio-grid
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.2.5
4
+ version: 0.2.6
5
5
  prerelease:
6
6
  platform: ruby
7
7
  authors:
@@ -11,6 +11,22 @@ bindir: bin
11
11
  cert_chain: []
12
12
  date: 2012-09-24 00:00:00.000000000 Z
13
13
  dependencies:
14
+ - !ruby/object:Gem::Dependency
15
+ name: uuid
16
+ requirement: !ruby/object:Gem::Requirement
17
+ none: false
18
+ requirements:
19
+ - - ! '>='
20
+ - !ruby/object:Gem::Version
21
+ version: '0'
22
+ type: :runtime
23
+ prerelease: false
24
+ version_requirements: !ruby/object:Gem::Requirement
25
+ none: false
26
+ requirements:
27
+ - - ! '>='
28
+ - !ruby/object:Gem::Version
29
+ version: '0'
14
30
  - !ruby/object:Gem::Dependency
15
31
  name: rspec
16
32
  requirement: !ruby/object:Gem::Requirement
@@ -75,6 +91,22 @@ dependencies:
75
91
  - - ~>
76
92
  - !ruby/object:Gem::Version
77
93
  version: 1.8.4
94
+ - !ruby/object:Gem::Dependency
95
+ name: uuid
96
+ requirement: !ruby/object:Gem::Requirement
97
+ none: false
98
+ requirements:
99
+ - - ! '>='
100
+ - !ruby/object:Gem::Version
101
+ version: '0'
102
+ type: :development
103
+ prerelease: false
104
+ version_requirements: !ruby/object:Gem::Requirement
105
+ none: false
106
+ requirements:
107
+ - - ! '>='
108
+ - !ruby/object:Gem::Version
109
+ version: '0'
78
110
  description: A BioGem to submit jobs on a queue system
79
111
  email: francesco.strozzi@gmail.com
80
112
  executables:
@@ -114,7 +146,7 @@ required_ruby_version: !ruby/object:Gem::Requirement
114
146
  version: '0'
115
147
  segments:
116
148
  - 0
117
- hash: 2636116439556584152
149
+ hash: -4333638046284412790
118
150
  required_rubygems_version: !ruby/object:Gem::Requirement
119
151
  none: false
120
152
  requirements: