bio-grid 0.2.5 → 0.2.6

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
data/Gemfile CHANGED
@@ -2,7 +2,7 @@ source "http://rubygems.org"
2
2
  # Add dependencies required to use your gem here.
3
3
  # Example:
4
4
  # gem "activesupport", ">= 2.3.5"
5
-
5
+ gem "uuid"
6
6
  # Add dependencies to develop your gem here.
7
7
  # Include everything needed to run rake, tests, features, etc.
8
8
  group :development do
@@ -10,4 +10,5 @@ group :development do
10
10
  gem "rdoc", "~> 3.12"
11
11
  gem "bundler", "> 1.0.0"
12
12
  gem "jeweler", "~> 1.8.4"
13
+ gem "uuid"
13
14
  end
@@ -9,6 +9,8 @@ GEM
9
9
  rake
10
10
  rdoc
11
11
  json (1.7.5)
12
+ macaddr (1.6.1)
13
+ systemu (~> 2.5.0)
12
14
  rake (0.9.2.2)
13
15
  rdoc (3.12)
14
16
  json (~> 1.4)
@@ -20,6 +22,9 @@ GEM
20
22
  rspec-expectations (2.8.0)
21
23
  diff-lcs (~> 1.1.2)
22
24
  rspec-mocks (2.8.0)
25
+ systemu (2.5.2)
26
+ uuid (2.3.5)
27
+ macaddr (~> 1.0)
23
28
 
24
29
  PLATFORMS
25
30
  ruby
@@ -29,3 +34,4 @@ DEPENDENCIES
29
34
  jeweler (~> 1.8.4)
30
35
  rdoc (~> 3.12)
31
36
  rspec (~> 2.8.0)
37
+ uuid
data/README.md CHANGED
@@ -82,12 +82,13 @@ For a complete list of current BioGrid parameters, type "bio-grid -h":
82
82
  -s, --split-number NUMBER Number of input files (or group of files) to use per job. If all the files in a location need to be used for a single job, just specify 'all'
83
83
  -p, --processes PROCESSES Number of processes per job
84
84
  -c, --command-line COMMANDLINE Command line to be executed
85
- -o, --output OUTPUT Output folder
85
+ -o, --output OUTPUT Output folder. Needs a <output> placeholder in the command line
86
86
  -r, --copy-to LOCATION Copy the output once a job is terminated
87
87
  -e, --erease-output Delete job output data when completed (useful to delete output temporary files on a computing node)
88
+ -a, --params PARAM1,PARAM2... List of parameters to use for testing. Needs a <param> placeholder in the command line
88
89
  -d, --dry Dry run. Just write the job scripts without sending them in queue (for debugging or testing)
89
90
  -t, --test Start the mapping only with the first group of reads (e.g. for testing parameters)
90
- -i, --input INPUT1,INPUT2... Location where to find input files (accepts wildcards). You can specify more than one input location, just provide a comma separated list
91
+ -i, --input INPUT1,INPUT2... Location where to find input files (accepts wildcards). Needs <input(1,2,3...> placeholder(s) in the command line
91
92
  --sep SEPARATOR Input file separator [Default: , ]
92
93
  --keep-scripts Keep all the running scripts created for all the jobs
93
94
  -h, --help Display this screen
@@ -99,7 +100,7 @@ Advanced stuff
99
100
  Ok let's unleash the potential of BioGrid.
100
101
  By putting together an automatic system to generate and submit jobs on a queue systems and a command line template approach, we can do some interesting things.
101
102
 
102
- Parameters sampling and testing
103
+ Numerical parameters sampling and testing
103
104
  -------------------------------
104
105
 
105
106
  The tipical scenario is when I have to run a tool on a new dataset and I would like to test different parameters to asses which are the better ones for my analysis.
@@ -119,7 +120,18 @@ So in this case, the ```-L``` parameter will take 6 different values: 22, 24, 26
119
120
 
120
121
  Last but not least, the ```-t``` option is essential so that only a single job per input file (or group of files) will be executed. Sampling parameters values is a typical combinatorial approach and this option avoids generating hundreds of different jobs only to sample a parameter. Coming back to the initial example, if I have 60 pairs of FastQ files, without the ```-t``` option, the job number will be 60x6 = 360, which is just crazy when you only want to test different parameter values.
121
122
 
122
- So far, BioGrid does not support sampling more than one parameter at the same time.
123
+ Others parameters sampling
124
+ --------------------------
125
+
126
+ If you want to sample non-numerical parameters, with BioGrid it is possible to use the ```--params``` option. So for instance, if I want to run Bowtie on my dataset to assess the results differences using the ```--sensitive```, ```--very-sensitive``` and ```--fast``` options, I can do it easely in this way:
127
+
128
+ ```shell
129
+ bio-grid -i "/data/Project_X/Sample_Y/*_R1_*.fastq.gz","/data/Project_X/Sample_Y/*_R2_*.fastq.gz" -n bowtie_mapping -c "/software/bowtie2 -x /genomes/genome_index -p 8 <param> -1 <input1> -2 <input2> > <output>.sam" -o /data/Project_X/Sample_Y_mapping -s 1 -p 8 -r /results/Sample_Y_mapping -e --param "--sensitive","--very-sensitive","--fast" -t
130
+ ```
131
+
132
+ In this case, the key points are the ```<param>``` placeholder in the command line and the corresponding ```--params``` options in BioGrid, which specify a list of parameters to be used to generate and run different jobs, each one with a different parameter in the list. Again, even in this case, it is recommended to do parameters testing using the ```-t``` option, which only runs a single job and not the full job array.
133
+
134
+ So far, BioGrid does not support, for each run, sampling more than one parameter at the same time.
123
135
 
124
136
  Contributing to bioruby-grid
125
137
  ============================
data/VERSION CHANGED
@@ -1 +1 @@
1
- 0.2.5
1
+ 0.2.6
@@ -26,7 +26,7 @@ optparse = OptionParser.new do |opts|
26
26
  options[:cmd] = cmd
27
27
  end
28
28
 
29
- opts.on("-o","--output OUTPUT","Output folder") do |out|
29
+ opts.on("-o","--output OUTPUT","Output folder. Needs a <output> placeholder in the command line") do |out|
30
30
  options[:output] = out
31
31
  end
32
32
 
@@ -38,13 +38,17 @@ optparse = OptionParser.new do |opts|
38
38
  options[:clean] = true
39
39
  end
40
40
 
41
+ opts.on("-a","--params PARAM1,PARAM2...",Array,"List of parameters to use for testing. Needs a <param> placeholder in the command line") do |params|
42
+ options[:params] = params
43
+ end
44
+
41
45
  opts.on("-d","--dry","Dry run. Just write the job scripts without sending them in queue (for debugging or testing)") {options[:dry] = true}
42
46
 
43
47
  opts.on("-t","--test","Start the mapping only with the first group of reads (e.g. for testing parameters)") do |test|
44
48
  options[:test] = true
45
49
  end
46
50
 
47
- opts.on("-i","--input INPUT1,INPUT2...",Array,"Location where to find input files (accepts wildcards). You can specify more than one input location, just provide a comma separated list") do |input|
51
+ opts.on("-i","--input INPUT1,INPUT2...",Array,"Location where to find input files (accepts wildcards). Needs <input(1,2,3...> placeholder(s) in the command line") do |input|
48
52
  options[:input] = input
49
53
  end
50
54
 
@@ -5,7 +5,7 @@
5
5
 
6
6
  Gem::Specification.new do |s|
7
7
  s.name = "bio-grid"
8
- s.version = "0.2.5"
8
+ s.version = "0.2.6"
9
9
 
10
10
  s.required_rubygems_version = Gem::Requirement.new(">= 0") if s.respond_to? :required_rubygems_version=
11
11
  s.authors = ["Francesco Strozzi"]
@@ -44,21 +44,27 @@ Gem::Specification.new do |s|
44
44
  s.specification_version = 3
45
45
 
46
46
  if Gem::Version.new(Gem::VERSION) >= Gem::Version.new('1.2.0') then
47
+ s.add_runtime_dependency(%q<uuid>, [">= 0"])
47
48
  s.add_development_dependency(%q<rspec>, ["~> 2.8.0"])
48
49
  s.add_development_dependency(%q<rdoc>, ["~> 3.12"])
49
50
  s.add_development_dependency(%q<bundler>, ["> 1.0.0"])
50
51
  s.add_development_dependency(%q<jeweler>, ["~> 1.8.4"])
52
+ s.add_development_dependency(%q<uuid>, [">= 0"])
51
53
  else
54
+ s.add_dependency(%q<uuid>, [">= 0"])
52
55
  s.add_dependency(%q<rspec>, ["~> 2.8.0"])
53
56
  s.add_dependency(%q<rdoc>, ["~> 3.12"])
54
57
  s.add_dependency(%q<bundler>, ["> 1.0.0"])
55
58
  s.add_dependency(%q<jeweler>, ["~> 1.8.4"])
59
+ s.add_dependency(%q<uuid>, [">= 0"])
56
60
  end
57
61
  else
62
+ s.add_dependency(%q<uuid>, [">= 0"])
58
63
  s.add_dependency(%q<rspec>, ["~> 2.8.0"])
59
64
  s.add_dependency(%q<rdoc>, ["~> 3.12"])
60
65
  s.add_dependency(%q<bundler>, ["> 1.0.0"])
61
66
  s.add_dependency(%q<jeweler>, ["~> 1.8.4"])
67
+ s.add_dependency(%q<uuid>, [">= 0"])
62
68
  end
63
69
  end
64
70
 
@@ -20,7 +20,14 @@ module Bio
20
20
  range.each do |value|
21
21
  cmd_line = options[:cmd].gsub(/<(\d+),(\d+)(,\d+)*>/,value.to_s)
22
22
  job = Bio::Grid::Job.new(options) # inherit global options
23
- job.options[:parameter_value] = "-param-#{value}"
23
+ job.options[:parameter_value] = "-param:#{value}"
24
+ job.execute(cmd_line,inputs,input1,groups,index)
25
+ end
26
+ elsif options[:params]
27
+ options[:params].each do |p|
28
+ cmd_line = options[:cmd].gsub(/<param>|<parameter>/,p)
29
+ job = Bio::Grid::Job.new(options)
30
+ job.options[:parameter_value] = "-param:#{p}"
24
31
  job.execute(cmd_line,inputs,input1,groups,index)
25
32
  end
26
33
  else
@@ -2,14 +2,16 @@ module Bio
2
2
  class Grid
3
3
  class Job
4
4
 
5
- attr_accessor :options, :instructions, :job_output, :runner
5
+ attr_accessor :options, :instructions, :job_output, :runner, :uuid
6
6
  def initialize(options)
7
7
  @options = options
8
- self.instructions = ""
8
+ self.instructions = []
9
+ self.uuid = UUID.new.generate.split("-").first
9
10
  end
10
11
 
11
12
  def set_output_dir
12
- self.instructions << ("mkdir -p #{self.options[:output]}\n")
13
+ output_dir = (self.options[:output_folder]) ? "mkdir -p #{self.job_output}\ncd #{self.job_output}\n" : "mkdir -p #{self.options[:output]}\n"
14
+ self.instructions.insert(1,output_dir)
13
15
  end
14
16
 
15
17
  def set_commandline(cmd_line,inputs,input1,groups,index)
@@ -20,13 +22,13 @@ module Bio
20
22
  job_output = ""
21
23
  if commandline =~/<output>\.(\S+)/
22
24
  extension = $1
23
- job_output = self.options[:output]+"/#{Time.now.to_i}"+self.options[:name]+"_output_%03d" % (index+1).to_s + "#{self.options[:parameter_value]}"
25
+ job_output = self.options[:output]+"/#{self.uuid}_"+self.options[:name]+"_output_%03d" % (index+1).to_s + "#{self.options[:parameter_value]}"
24
26
  commandline.gsub!(/<output>/,job_output)
25
27
  job_output << ".#{extension}"
26
28
  else
27
29
  self.options[:output_folder] = true
28
- commandline.gsub!(/<output>/,self.options[:output]+"/#{Time.now.to_i}_"+self.options[:name])
29
- job_output = self.options[:output]+"/#{Time.now.to_i}_"+self.options[:name]
30
+ job_output = self.options[:output]+"/#{self.uuid}_"+self.options[:name]
31
+ commandline.gsub!(/<output>/,job_output)
30
32
  end
31
33
  self.instructions << commandline+"\n"
32
34
  self.job_output = job_output
@@ -48,7 +50,7 @@ module Bio
48
50
  def write_runner(filename)
49
51
  self.runner = filename
50
52
  out = File.open(Dir.pwd+"/"+filename,"w")
51
- out.write(self.instructions+"\n")
53
+ out.write(self.instructions.join+"\n")
52
54
  out.close
53
55
  end
54
56
 
@@ -63,8 +65,8 @@ module Bio
63
65
 
64
66
  def execute(command_line,inputs,input1,groups,index)
65
67
  self.set_scheduler_options(:pbs) # set script specific options for the scheduling system
66
- self.set_output_dir
67
68
  self.set_commandline(command_line,inputs,input1,groups,index)
69
+ self.set_output_dir
68
70
  self.append_options
69
71
  job_filename = (self.options[:keep]) ? "job_#{index+1}#{self.options[:parameter_value]}.sh" : "job.sh"
70
72
  self.run(job_filename)
@@ -1,2 +1,3 @@
1
+ require 'uuid'
1
2
  require 'bio/grid'
2
3
  require 'bio/grid/job'
metadata CHANGED
@@ -1,7 +1,7 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: bio-grid
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.2.5
4
+ version: 0.2.6
5
5
  prerelease:
6
6
  platform: ruby
7
7
  authors:
@@ -11,6 +11,22 @@ bindir: bin
11
11
  cert_chain: []
12
12
  date: 2012-09-24 00:00:00.000000000 Z
13
13
  dependencies:
14
+ - !ruby/object:Gem::Dependency
15
+ name: uuid
16
+ requirement: !ruby/object:Gem::Requirement
17
+ none: false
18
+ requirements:
19
+ - - ! '>='
20
+ - !ruby/object:Gem::Version
21
+ version: '0'
22
+ type: :runtime
23
+ prerelease: false
24
+ version_requirements: !ruby/object:Gem::Requirement
25
+ none: false
26
+ requirements:
27
+ - - ! '>='
28
+ - !ruby/object:Gem::Version
29
+ version: '0'
14
30
  - !ruby/object:Gem::Dependency
15
31
  name: rspec
16
32
  requirement: !ruby/object:Gem::Requirement
@@ -75,6 +91,22 @@ dependencies:
75
91
  - - ~>
76
92
  - !ruby/object:Gem::Version
77
93
  version: 1.8.4
94
+ - !ruby/object:Gem::Dependency
95
+ name: uuid
96
+ requirement: !ruby/object:Gem::Requirement
97
+ none: false
98
+ requirements:
99
+ - - ! '>='
100
+ - !ruby/object:Gem::Version
101
+ version: '0'
102
+ type: :development
103
+ prerelease: false
104
+ version_requirements: !ruby/object:Gem::Requirement
105
+ none: false
106
+ requirements:
107
+ - - ! '>='
108
+ - !ruby/object:Gem::Version
109
+ version: '0'
78
110
  description: A BioGem to submit jobs on a queue system
79
111
  email: francesco.strozzi@gmail.com
80
112
  executables:
@@ -114,7 +146,7 @@ required_ruby_version: !ruby/object:Gem::Requirement
114
146
  version: '0'
115
147
  segments:
116
148
  - 0
117
- hash: 2636116439556584152
149
+ hash: -4333638046284412790
118
150
  required_rubygems_version: !ruby/object:Gem::Requirement
119
151
  none: false
120
152
  requirements: