RubyGems - bio-grid - Versions diffs - 0.2.5 → 0.2.6 - Mend

bio-grid 0.2.5 → 0.2.6

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (10) hide show

data/Gemfile CHANGED

@@ -2,7 +2,7 @@ source "http://rubygems.org"
 # Add dependencies required to use your gem here.
 # Example:
 #   gem "activesupport", ">= 2.3.5"
+gem "uuid"
 # Add dependencies to develop your gem here.
 # Include everything needed to run rake, tests, features, etc.
 group :development do
@@ -10,4 +10,5 @@ group :development do
   gem "rdoc", "~> 3.12"
   gem "bundler", "> 1.0.0"
   gem "jeweler", "~> 1.8.4"
+	gem "uuid"
 end

data/Gemfile.lock CHANGED

@@ -9,6 +9,8 @@ GEM
       rake
       rdoc
     json (1.7.5)
+    macaddr (1.6.1)
+      systemu (~> 2.5.0)
     rake (0.9.2.2)
     rdoc (3.12)
       json (~> 1.4)
@@ -20,6 +22,9 @@ GEM
     rspec-expectations (2.8.0)
       diff-lcs (~> 1.1.2)
     rspec-mocks (2.8.0)
+    systemu (2.5.2)
+    uuid (2.3.5)
+      macaddr (~> 1.0)
 PLATFORMS
   ruby
@@ -29,3 +34,4 @@ DEPENDENCIES
   jeweler (~> 1.8.4)
   rdoc (~> 3.12)
   rspec (~> 2.8.0)
+  uuid

data/README.md CHANGED

@@ -82,12 +82,13 @@ For a complete list of current BioGrid parameters, type "bio-grid -h":
     -s, --split-number NUMBER        Number of input files (or group of files) to use per job. If all the files in a location need to be used for a single job, just specify 'all'
     -p, --processes PROCESSES        Number of processes per job
     -c, --command-line COMMANDLINE   Command line to be executed
-    -o, --output OUTPUT              Output folder
+    -o, --output OUTPUT              Output folder. Needs a <output> placeholder in the command line
     -r, --copy-to LOCATION           Copy the output once a job is terminated
     -e, --erease-output              Delete job output data when completed (useful to delete output temporary files on a computing node)
+    -a, --params PARAM1,PARAM2...    List of parameters to use for testing. Needs a <param> placeholder in the command line
     -d, --dry                        Dry run. Just write the job scripts without sending them in queue (for debugging or testing)
     -t, --test                       Start the mapping only with the first group of reads (e.g. for testing parameters)
-    -i, --input INPUT1,INPUT2...     Location where to find input files (accepts wildcards). You can specify more than one input location, just provide a comma separated list
+    -i, --input INPUT1,INPUT2...     Location where to find input files (accepts wildcards). Needs <input(1,2,3...> placeholder(s) in the command line
         --sep SEPARATOR              Input file separator [Default: , ]
         --keep-scripts               Keep all the running scripts created for all the jobs
     -h, --help                       Display this screen
@@ -99,7 +100,7 @@ Advanced stuff
 Ok let's unleash the potential of BioGrid.
 By putting together an automatic system to generate and submit jobs on a queue systems and a command line template approach, we can do some interesting things.
-Parameters sampling and testing
+Numerical parameters sampling and testing
 -------------------------------
 The tipical scenario is when I have to run a tool on a new dataset and I would like to test different parameters to asses which are the better ones for my analysis.
@@ -119,7 +120,18 @@ So in this case, the ```-L``` parameter will take 6 different values: 22, 24, 26
 Last but not least, the ```-t``` option is essential so that only a single job per input file (or group of files) will be executed. Sampling parameters values is a typical combinatorial approach and this option avoids generating hundreds of different jobs only to sample a parameter. Coming back to the initial example, if I have 60 pairs of FastQ files, without the ```-t``` option, the job number will be 60x6 = 360, which is just crazy when you only want to test different parameter values.
-So far, BioGrid does not support sampling more than one parameter at the same time.
+Others parameters sampling
+--------------------------
+If you want to sample non-numerical parameters, with BioGrid it is possible to use the ```--params``` option. So for instance, if I want to run Bowtie on my dataset to assess the results differences using the ```--sensitive```, ```--very-sensitive``` and ```--fast``` options, I can do it easely in this way:
+```shell
+bio-grid -i "/data/Project_X/Sample_Y/*_R1_*.fastq.gz","/data/Project_X/Sample_Y/*_R2_*.fastq.gz" -n bowtie_mapping -c "/software/bowtie2 -x /genomes/genome_index -p 8 <param> -1 <input1> -2 <input2> > <output>.sam" -o /data/Project_X/Sample_Y_mapping -s 1 -p 8 -r /results/Sample_Y_mapping -e --param "--sensitive","--very-sensitive","--fast" -t
+```
+In this case, the key points are the ```<param>``` placeholder in the command line and the corresponding ```--params``` options in BioGrid, which specify a list of parameters to be used to generate and run different jobs, each one with a different parameter in the list. Again, even in this case, it is recommended to do parameters testing using the ```-t``` option, which only runs a single job and not the full job array.
+So far, BioGrid does not support, for each run, sampling more than one parameter at the same time.
 Contributing to bioruby-grid
 ============================

data/VERSION CHANGED

	@@ -1 +1 @@
1	- 0.2.5
1	+ 0.2.6

data/bin/bio-grid CHANGED

@@ -26,7 +26,7 @@ optparse = OptionParser.new do |opts|
 		options[:cmd] = cmd
 	end
-	opts.on("-o","--output OUTPUT","Output folder") do |out|
+	opts.on("-o","--output OUTPUT","Output folder. Needs a <output> placeholder in the command line") do |out|
 		options[:output] = out
 	end
@@ -38,13 +38,17 @@ optparse = OptionParser.new do |opts|
 		options[:clean] = true
 	end
+	opts.on("-a","--params PARAM1,PARAM2...",Array,"List of parameters to use for testing. Needs a <param> placeholder in the command line") do |params|
+		options[:params] = params
+	end
 	opts.on("-d","--dry","Dry run. Just write the job scripts without sending them in queue (for debugging or testing)") {options[:dry] = true}
 	opts.on("-t","--test","Start the mapping only with the first group of reads (e.g. for testing parameters)") do |test|
 		options[:test] = true
 	end
-	opts.on("-i","--input INPUT1,INPUT2...",Array,"Location where to find input files (accepts wildcards). You can specify more than one input location, just provide a comma separated list") do |input|
+	opts.on("-i","--input INPUT1,INPUT2...",Array,"Location where to find input files (accepts wildcards). Needs <input(1,2,3...> placeholder(s) in the command line") do |input|
 		options[:input] = input
 	end

data/bio-grid.gemspec CHANGED

@@ -5,7 +5,7 @@
 Gem::Specification.new do |s|
   s.name = "bio-grid"
-  s.version = "0.2.5"
+  s.version = "0.2.6"
   s.required_rubygems_version = Gem::Requirement.new(">= 0") if s.respond_to? :required_rubygems_version=
   s.authors = ["Francesco Strozzi"]
@@ -44,21 +44,27 @@ Gem::Specification.new do |s|
     s.specification_version = 3
     if Gem::Version.new(Gem::VERSION) >= Gem::Version.new('1.2.0') then
+      s.add_runtime_dependency(%q<uuid>, [">= 0"])
       s.add_development_dependency(%q<rspec>, ["~> 2.8.0"])
       s.add_development_dependency(%q<rdoc>, ["~> 3.12"])
       s.add_development_dependency(%q<bundler>, ["> 1.0.0"])
       s.add_development_dependency(%q<jeweler>, ["~> 1.8.4"])
+      s.add_development_dependency(%q<uuid>, [">= 0"])
     else
+      s.add_dependency(%q<uuid>, [">= 0"])
       s.add_dependency(%q<rspec>, ["~> 2.8.0"])
       s.add_dependency(%q<rdoc>, ["~> 3.12"])
       s.add_dependency(%q<bundler>, ["> 1.0.0"])
       s.add_dependency(%q<jeweler>, ["~> 1.8.4"])
+      s.add_dependency(%q<uuid>, [">= 0"])
     end
   else
+    s.add_dependency(%q<uuid>, [">= 0"])
     s.add_dependency(%q<rspec>, ["~> 2.8.0"])
     s.add_dependency(%q<rdoc>, ["~> 3.12"])
     s.add_dependency(%q<bundler>, ["> 1.0.0"])
     s.add_dependency(%q<jeweler>, ["~> 1.8.4"])
+    s.add_dependency(%q<uuid>, [">= 0"])
   end
 end

data/lib/bio/grid.rb CHANGED

@@ -20,7 +20,14 @@ module Bio
 					range.each do |value|
 						cmd_line = options[:cmd].gsub(/<(\d+),(\d+)(,\d+)*>/,value.to_s)
 						job = Bio::Grid::Job.new(options) # inherit global options
-						job.options[:parameter_value] = "-param-#{value}"
+						job.options[:parameter_value] = "-param:#{value}"
+						job.execute(cmd_line,inputs,input1,groups,index)
+					end
+				elsif options[:params]
+					options[:params].each do |p|
+						cmd_line = options[:cmd].gsub(/<param>|<parameter>/,p)
+						job = Bio::Grid::Job.new(options)
+						job.options[:parameter_value] = "-param:#{p}"
 						job.execute(cmd_line,inputs,input1,groups,index)
 					end
 				else

data/lib/bio/grid/job.rb CHANGED

@@ -2,14 +2,16 @@ module Bio
 	class Grid
 		class Job
-			attr_accessor :options, :instructions, :job_output, :runner
+			attr_accessor :options, :instructions, :job_output, :runner, :uuid
 			def initialize(options)
 				@options = options
-				self.instructions = ""
+				self.instructions = []
+				self.uuid = UUID.new.generate.split("-").first
 			end
 			def	set_output_dir
-				self.instructions << ("mkdir -p #{self.options[:output]}\n")
+				output_dir = (self.options[:output_folder]) ? "mkdir -p #{self.job_output}\ncd #{self.job_output}\n" : "mkdir -p #{self.options[:output]}\n"
+				self.instructions.insert(1,output_dir)
 			end
 			def set_commandline(cmd_line,inputs,input1,groups,index)
@@ -20,13 +22,13 @@ module Bio
 				job_output = ""
 				if commandline =~/<output>\.(\S+)/
 					extension = $1
-					job_output = self.options[:output]+"/#{Time.now.to_i}"+self.options[:name]+"_output_%03d" % (index+1).to_s + "#{self.options[:parameter_value]}"
+					job_output = self.options[:output]+"/#{self.uuid}_"+self.options[:name]+"_output_%03d" % (index+1).to_s + "#{self.options[:parameter_value]}"
 					commandline.gsub!(/<output>/,job_output)
 					job_output << ".#{extension}"
 				else
 					self.options[:output_folder] = true
-					commandline.gsub!(/<output>/,self.options[:output]+"/#{Time.now.to_i}_"+self.options[:name])
-					job_output = self.options[:output]+"/#{Time.now.to_i}_"+self.options[:name]
+					job_output = self.options[:output]+"/#{self.uuid}_"+self.options[:name]
+					commandline.gsub!(/<output>/,job_output)
 				end
 				self.instructions << commandline+"\n"
 				self.job_output = job_output
@@ -48,7 +50,7 @@ module Bio
 			def write_runner(filename)
 				self.runner = filename
 				out = File.open(Dir.pwd+"/"+filename,"w")
-				out.write(self.instructions+"\n")
+				out.write(self.instructions.join+"\n")
 				out.close
 			end
@@ -63,8 +65,8 @@ module Bio
 			def	execute(command_line,inputs,input1,groups,index)
 				self.set_scheduler_options(:pbs) # set script specific options for the scheduling system
-				self.set_output_dir
         self.set_commandline(command_line,inputs,input1,groups,index)
+				self.set_output_dir
         self.append_options
         job_filename = (self.options[:keep]) ? "job_#{index+1}#{self.options[:parameter_value]}.sh" : "job.sh"
         self.run(job_filename)

data/lib/bioruby-grid.rb CHANGED

@@ -1,2 +1,3 @@
+require 'uuid'
 require 'bio/grid'
 require 'bio/grid/job'

metadata CHANGED

@@ -1,7 +1,7 @@
 --- !ruby/object:Gem::Specification
 name: bio-grid
 version: !ruby/object:Gem::Version
-  version: 0.2.5
+  version: 0.2.6
   prerelease:
 platform: ruby
 authors:
@@ -11,6 +11,22 @@ bindir: bin
 cert_chain: []
 date: 2012-09-24 00:00:00.000000000 Z
 dependencies:
+- !ruby/object:Gem::Dependency
+  name: uuid
+  requirement: !ruby/object:Gem::Requirement
+    none: false
+    requirements:
+    - - ! '>='
+      - !ruby/object:Gem::Version
+        version: '0'
+  type: :runtime
+  prerelease: false
+  version_requirements: !ruby/object:Gem::Requirement
+    none: false
+    requirements:
+    - - ! '>='
+      - !ruby/object:Gem::Version
+        version: '0'
 - !ruby/object:Gem::Dependency
   name: rspec
   requirement: !ruby/object:Gem::Requirement
@@ -75,6 +91,22 @@ dependencies:
     - - ~>
       - !ruby/object:Gem::Version
         version: 1.8.4
+- !ruby/object:Gem::Dependency
+  name: uuid
+  requirement: !ruby/object:Gem::Requirement
+    none: false
+    requirements:
+    - - ! '>='
+      - !ruby/object:Gem::Version
+        version: '0'
+  type: :development
+  prerelease: false
+  version_requirements: !ruby/object:Gem::Requirement
+    none: false
+    requirements:
+    - - ! '>='
+      - !ruby/object:Gem::Version
+        version: '0'
 description: A BioGem to submit jobs on a queue system
 email: francesco.strozzi@gmail.com
 executables:
@@ -114,7 +146,7 @@ required_ruby_version: !ruby/object:Gem::Requirement
       version: '0'
       segments:
       - 0
-      hash: 2636116439556584152
+      hash: -4333638046284412790
 required_rubygems_version: !ruby/object:Gem::Requirement
   none: false
   requirements: