bio-grid 0.2.0

Sign up to get free protection for your applications and to get access to all the features.
@@ -0,0 +1,5 @@
1
+ lib/**/*.rb
2
+ bin/*
3
+ -
4
+ features/**/*.feature
5
+ LICENSE.txt
data/.rspec ADDED
@@ -0,0 +1 @@
1
+ --color
data/Gemfile ADDED
@@ -0,0 +1,13 @@
1
+ source "http://rubygems.org"
2
+ # Add dependencies required to use your gem here.
3
+ # Example:
4
+ # gem "activesupport", ">= 2.3.5"
5
+
6
+ # Add dependencies to develop your gem here.
7
+ # Include everything needed to run rake, tests, features, etc.
8
+ group :development do
9
+ gem "rspec", "~> 2.8.0"
10
+ gem "rdoc", "~> 3.12"
11
+ gem "bundler", "> 1.0.0"
12
+ gem "jeweler", "~> 1.8.4"
13
+ end
@@ -0,0 +1,31 @@
1
+ GEM
2
+ remote: http://rubygems.org/
3
+ specs:
4
+ diff-lcs (1.1.3)
5
+ git (1.2.5)
6
+ jeweler (1.8.4)
7
+ bundler (~> 1.0)
8
+ git (>= 1.2.5)
9
+ rake
10
+ rdoc
11
+ json (1.7.5)
12
+ rake (0.9.2.2)
13
+ rdoc (3.12)
14
+ json (~> 1.4)
15
+ rspec (2.8.0)
16
+ rspec-core (~> 2.8.0)
17
+ rspec-expectations (~> 2.8.0)
18
+ rspec-mocks (~> 2.8.0)
19
+ rspec-core (2.8.0)
20
+ rspec-expectations (2.8.0)
21
+ diff-lcs (~> 1.1.2)
22
+ rspec-mocks (2.8.0)
23
+
24
+ PLATFORMS
25
+ ruby
26
+
27
+ DEPENDENCIES
28
+ bundler (> 1.0.0)
29
+ jeweler (~> 1.8.4)
30
+ rdoc (~> 3.12)
31
+ rspec (~> 2.8.0)
@@ -0,0 +1,20 @@
1
+ Copyright (c) 2012 Francesco Strozzi
2
+
3
+ Permission is hereby granted, free of charge, to any person obtaining
4
+ a copy of this software and associated documentation files (the
5
+ "Software"), to deal in the Software without restriction, including
6
+ without limitation the rights to use, copy, modify, merge, publish,
7
+ distribute, sublicense, and/or sell copies of the Software, and to
8
+ permit persons to whom the Software is furnished to do so, subject to
9
+ the following conditions:
10
+
11
+ The above copyright notice and this permission notice shall be
12
+ included in all copies or substantial portions of the Software.
13
+
14
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
15
+ EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
16
+ MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
17
+ NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE
18
+ LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION
19
+ OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION
20
+ WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
@@ -0,0 +1,140 @@
1
+ bioruby-grid
2
+ ============
3
+
4
+ Utility to create and distribute jobs on a queue system. It is particularly suited to process BigData (i.e. NGS analyses), helping generating hundreds of different jobs with ease to crunch large datasets.
5
+
6
+ Usage
7
+ =====
8
+
9
+ This utility is a command line based tool built around the concept of a template that can be reused to generate tens, hundreds or thousands of different jobs to be sent on a queue system.
10
+
11
+ The tool for now supports only PBS queue systems, but can be easily expanded to account also for other queueing systems.
12
+
13
+ A typical example
14
+ -----------------
15
+
16
+ Let's say I have a bunch of FastQ files that I want to analyze using my favorite reads mapping tool. These files come from a typical Illumina paired end sequencing and I have 60 files from the read 1 and another 60 files from the read 2. Given that I have a distributed system I want to spread the alignments on the cluster (or grid), to speed up the analysis as much as possible.
17
+
18
+ Instead of having to manually create a number of running scripts or rewrite for every analysis a new script to do this work, BioGrid can help you saving time handling all of this.
19
+
20
+ ```shell
21
+ bio-grid -i "/data/Project_X/Sample_Y/*_R1_*.fastq.gz","/data/Project_X/Sample_Y/*_R2_*.fastq.gz" -n bowtie_mapping -c "/software/bowtie2 -x /genomes/genome_index -p 8 -1 <input1> -2 <input2> > <output>.sam" -o /data/Project_X/Sample_Y_mapping -s 1 -p 8
22
+ ```
23
+
24
+ What is happening here is the following:
25
+
26
+ * the ```-i``` options specifies the input files or, as in this case, the location where to find input files based on a typical wildcard expression. You can actually specify as many input files/locations as you need using a comma separated list.
27
+ * the ```-n``` specify the job name
28
+ * the ```-c``` is the command line to be executed on the cluster / grid system. What BioGrid does is to fill in the ```<input1>```,```<input2>``` and ```<output>``` placeholders with the corresponding parameters passed on the command line. This is done for each input file (or each group of input files) and BioGrid will check if the ```<output>``` placeholder has an extension (like .sam, .out etc.) and will generate a unique output file name for each job. IMPORTANT: If no extension is specified for the ```<output>``` placeholder, BioGrid will assume the job will generate more than one output files and that those files will be saved into the folder specified by the "-o" option. Therefore it will manage the output as a whole directory, copying and/or removing the entire folder if "-r" and "-e" options are present (check the [Other options](https://github.com/fstrozzi/bioruby-grid#other-options) section to see what these options are expected to do).
29
+
30
+
31
+ * the ```-o``` set the location where output files for each job will be saved. Only provide the folder where you want to save the output file(s), BioGrid will take care of generating a unique file name for the output, if needed.
32
+ * the ```-s``` is a key parameter to specify the granularity of the jobs, setting the number of input files (or group of files, when more than one input placeholder is present in the command line) to be used for each job. So, going back to the FastQ example, if -s 1 is specified, each job will be run with exactly one FastQ R1 file and one FastQ R2 file. This gives you a great power in deciding how to split the entire dataset analysis across multiple computing nodes.
33
+ * the ```-p``` parameter indicates how many processes we want to use for each job. This number needs to match with the actual number of threads / processes that our command or tool will use for the analysis.
34
+
35
+ All of this is just turned into a submission script that will look like this:
36
+
37
+ ```shell
38
+ #!/bin/bash
39
+ #PBS -N bowtie_mapping
40
+ #PBS -l ncpus=8
41
+
42
+ mkdir -p /data/Project_X/Sample_Y_mapping
43
+ /software/bowtie2 -x /genomes/genome_index -p 8 -1 /data/Project_X/Sample_Y/Sample_Y_L001_R1_001.fastq.gz -2 Sample_Y_L001_R2_001.fastq.gz > /data/Project_X/Sample_Y_mapping/bowtie_mapping-output_001.sam
44
+ ```
45
+
46
+ and this will be repeated for every input file, according to the -s parameter. So, in this case given that we have 2 input files for each command line and that we had 60 R1 and 60 R2 FastQ files and we have specified "-s 1", 60 different jobs will be created and submitted, each with a specific read pair to be processed by Bowtie.
47
+
48
+ Other options
49
+ -------------
50
+
51
+ With BioGrid you can specify many different tasks for the job to execute, for example:
52
+
53
+ * ```-t``` to execute only a single job, which is useful to test parameters
54
+ * ```-r``` to specify a different location from the one used in ```-o```. This folder will be used to copy job outputs once terminated
55
+ * ```-e``` to erease output files/folders specified by ```-o``` once a job is completed (useful in conjuction with ```-r``` to delete local data on a computing node)
56
+ * ```-d``` for a dry run, to create submissions scripts without sending them in the queue system
57
+
58
+ The following BioGrid command line:
59
+
60
+ ```shell
61
+ bio-grid -i "/data/Project_X/Sample_Y/*_R1_*.fastq.gz","/data/Project_X/Sample_Y/*_R2_*.fastq.gz" -n bowtie_mapping -c "/software/bowtie2 -x /genomes/genome_index -p 8 -1 <input1> -2 <input2> > <output>.sam" -o /data/Project_X/Sample_Y_mapping -s 1 -p 8 -r /results/Sample_Y_mapping -e
62
+ ```
63
+
64
+ will be turned into this submission script:
65
+
66
+ ```shell
67
+ #!/bin/bash
68
+ #PBS -N bowtie_mapping
69
+ #PBS -l ncpus=8
70
+
71
+ mkdir -p /data/Project_X/Sample_Y_mapping # output dir
72
+ /software/bowtie2 -x /genomes/genome_index -p 8 -1 /data/Project_X/Sample_Y/Sample_Y_L001_R1_001.fastq.gz -2 Sample_Y_L001_R2_001.fastq.gz > /data/Project_X/Sample_Y_mapping/bowtie_mapping-output_001.sam # command line
73
+ mkdir -p /results/Sample_Y_mapping # final location where to copy job output once terminated
74
+ cp /data/Project_X/Sample_Y_mapping/bowtie_mapping-output_001.sam /results/Sample_Y_mapping # copy the outputs to the final location
75
+ rm -f /data/Project_X/Sample_Y_mapping/bowtie_mapping-output_001.sam # deleting output data
76
+ ```
77
+
78
+ For a complete list of current BioGrid parameters, type "bio-grid -h":
79
+
80
+ ```
81
+ -n, --name NAME Analysis name
82
+ -s, --split-number NUMBER Number of input files (or group of files) to use per job. If all the files in a location need to be used for a single job, just specify 'all'
83
+ -p, --processes PROCESSES Number of processes per job
84
+ -c, --command-line COMMANDLINE Command line to be executed
85
+ -o, --output OUTPUT Output folder
86
+ -r, --copy-to LOCATION Copy the output once a job is terminated
87
+ -e, --erease-output Delete job output data when completed (useful to delete output temporary files on a computing node)
88
+ -d, --dry Dry run. Just write the job scripts without sending them in queue (for debugging or testing)
89
+ -t, --test Start the mapping only with the first group of reads (e.g. for testing parameters)
90
+ -i, --input INPUT1,INPUT2... Location where to find input files (accepts wildcards). You can specify more than one input location, just provide a comma separated list
91
+ --sep SEPARATOR Input file separator [Default: , ]
92
+ --keep-scripts Keep all the running scripts created for all the jobs
93
+ -h, --help Display this screen
94
+ ```
95
+
96
+ Advanced stuff
97
+ ==============
98
+
99
+ Ok let's unleash the potential of BioGrid.
100
+ By putting together an automatic system to generate and submit jobs on a queue systems and a command line template approach, we can do some interesting things.
101
+
102
+ Parameters sampling and testing
103
+ -------------------------------
104
+
105
+ The tipical scenario is when I have to run a tool on a new dataset and I would like to test different parameters to asses which are the better ones for my analysis.
106
+ This can be easily done with BioGrid. For example:
107
+
108
+ ```shell
109
+ bio-grid -i "/data/Project_X/Sample_Y/*_R1_*.fastq.gz","/data/Project_X/Sample_Y/*_R2_*.fastq.gz" -n bowtie_mapping -c "/software/bowtie2 -x /genomes/genome_index -p 8 -L <22,32,2> -1 <input1> -2 <input2> > <output>.sam" -o /data/Project_X/Sample_Y_mapping -s 1 -p 8 -r /results/Sample_Y_mapping -e -t
110
+ ```
111
+
112
+ The key points here are the ```-L <22,32,2>``` in the command line template and the ```-t``` options of BioGrid. The first is a way to tell BioGrid to generate a number of similar jobs, each one with a different value for the parameter ```-L```. The values are decided based on the information passsed within the ```< >```:
113
+
114
+ * the first number is the first value that the parameter will take
115
+ * the second number is the last value that the parameter will take
116
+ * the third number is the increment to generate the range of values in between
117
+
118
+ So in this case, the ```-L``` parameter will take 6 different values: 22, 24, 26, 28, 30 and 32.
119
+
120
+ Last but not least, the ```-t``` option is essential so that only a single job per input file (or group of files) will be executed. Sampling parameters values is a typical combinatorial approach and this option avoids generating hundreds of different jobs only to sample a parameter. Coming back to the initial example, if I have 60 pairs of FastQ files, without the ```-t``` option, the job number will be 60x6 = 360, which is just crazy when you only want to test different parameter values.
121
+
122
+ So far, BioGrid does not support sampling more than one parameter at the same time.
123
+
124
+ Contributing to bioruby-grid
125
+ ============================
126
+
127
+ * Check out the latest master to make sure the feature hasn't been implemented or the bug hasn't been fixed yet.
128
+ * Check out the issue tracker to make sure someone already hasn't requested it and/or contributed it.
129
+ * Fork the project.
130
+ * Start a feature/bugfix branch.
131
+ * Commit and push until you are happy with your contribution.
132
+ * Make sure to add tests for it. This is important so I don't break it in a future version unintentionally.
133
+ * Please try not to mess with the Rakefile, version, or history. If you want to have your own version, or is otherwise necessary, that is fine, but please isolate to its own commit so I can cherry-pick around it.
134
+
135
+ Copyright
136
+ =========
137
+
138
+ Copyright (c) 2012 Francesco Strozzi. See LICENSE.txt for
139
+ further details.
140
+
@@ -0,0 +1,49 @@
1
+ # encoding: utf-8
2
+
3
+ require 'rubygems'
4
+ require 'bundler'
5
+ begin
6
+ Bundler.setup(:default, :development)
7
+ rescue Bundler::BundlerError => e
8
+ $stderr.puts e.message
9
+ $stderr.puts "Run `bundle install` to install missing gems"
10
+ exit e.status_code
11
+ end
12
+ require 'rake'
13
+
14
+ require 'jeweler'
15
+ Jeweler::Tasks.new do |gem|
16
+ # gem is a Gem::Specification... see http://docs.rubygems.org/read/chapter/20 for more options
17
+ gem.name = "bio-grid"
18
+ gem.homepage = "http://github.com/fstrozzi/bioruby-grid"
19
+ gem.license = "MIT"
20
+ gem.summary = %Q{A BioGem to submit jobs on a queue system}
21
+ gem.description = %{A BioGem to submit jobs on a queue system}
22
+ gem.email = "francesco.strozzi@gmail.com"
23
+ gem.authors = ["Francesco Strozzi"]
24
+ # dependencies defined in Gemfile
25
+ end
26
+ Jeweler::RubygemsDotOrgTasks.new
27
+
28
+ require 'rspec/core'
29
+ require 'rspec/core/rake_task'
30
+ RSpec::Core::RakeTask.new(:spec) do |spec|
31
+ spec.pattern = FileList['spec/**/*_spec.rb']
32
+ end
33
+
34
+ RSpec::Core::RakeTask.new(:rcov) do |spec|
35
+ spec.pattern = 'spec/**/*_spec.rb'
36
+ spec.rcov = true
37
+ end
38
+
39
+ task :default => :spec
40
+
41
+ require 'rdoc/task'
42
+ Rake::RDocTask.new do |rdoc|
43
+ version = File.exist?('VERSION') ? File.read('VERSION') : ""
44
+
45
+ rdoc.rdoc_dir = 'rdoc'
46
+ rdoc.title = "bioruby-grid #{version}"
47
+ rdoc.rdoc_files.include('README*')
48
+ rdoc.rdoc_files.include('lib/**/*.rb')
49
+ end
data/VERSION ADDED
@@ -0,0 +1 @@
1
+ 0.2.0
@@ -0,0 +1,72 @@
1
+ #!/usr/bin/env ruby
2
+
3
+ require 'optparse'
4
+ $:<< File.expand_path(File.join(File.dirname(File.dirname __FILE__),"lib"))
5
+ require 'bioruby-grid'
6
+
7
+ options = {}
8
+ options[:sep] = ","
9
+
10
+ optparse = OptionParser.new do |opts|
11
+ opts.banner = "\nCopyright(c) 2012 Francesco Strozzi\nUtility to create and distribute jobs on a queue system.\n\nE.g. #{$0} -i \"/Project_X/Sample_Y/*_R1_*.fastq\",\"/Project_X/Sample_Y/*_R2_*.fastq\" --name bowtie2 -s 10 -p 12 --command-line \"/software/bowtie2 -x /genomes/bowtie2_index/genome_index -1 <input1> -2 <input2> -p 12 > <output>.sam\" --output /tmp/Sample_Y_mapping --copy-to /archive/Sample_Y_mapping --erease-output\n\n\n"
12
+
13
+ opts.on("-n","--name NAME","Analysis name") do |name|
14
+ options[:name] = name
15
+ end
16
+
17
+ opts.on("-s","--split-number NUMBER","Number of input files (or group of files) to use per job. If all the files in a location need to be used for a single job, just specify 'all'") do |number|
18
+ options[:number] = number
19
+ end
20
+
21
+ opts.on("-p","--processes PROCESSES","Number of processes per job") do |processes|
22
+ options[:processes] = processes
23
+ end
24
+
25
+ opts.on("-c","--command-line COMMANDLINE","Command line to be executed") do |cmd|
26
+ options[:cmd] = cmd
27
+ end
28
+
29
+ opts.on("-o","--output OUTPUT","Output folder") do |out|
30
+ options[:output] = out
31
+ end
32
+
33
+ opts.on("-r","--copy-to LOCATION","Copy the output once a job is terminated") do |location|
34
+ options[:copy] = location
35
+ end
36
+
37
+ opts.on("-e","--erease-output","Delete job output data when completed (useful to delete output temporary files on a computing node)") do |clean|
38
+ options[:clean] = true
39
+ end
40
+
41
+ opts.on("-d","--dry","Dry run. Just write the job scripts without sending them in queue (for debugging or testing)") {options[:dry] = true}
42
+
43
+ opts.on("-t","--test","Start the mapping only with the first group of reads (e.g. for testing parameters)") do |test|
44
+ options[:test] = true
45
+ end
46
+
47
+ opts.on("-i","--input INPUT1,INPUT2...",Array,"Location where to find input files (accepts wildcards). You can specify more than one input location, just provide a comma separated list") do |input|
48
+ options[:input] = input
49
+ end
50
+
51
+ opts.on("--sep SEPARATOR","Input file separator [Default: , ]") do |sep|
52
+ options[:sep] = sep
53
+ end
54
+
55
+ opts.on("--keep-scripts","Keep all the running scripts created for all the jobs") {options[:keep] = true}
56
+
57
+ opts.on("-h","--help","Display this screen") do
58
+ puts opts
59
+ print "\n"
60
+ end
61
+ end
62
+
63
+ optparse.parse!
64
+
65
+ raise OptionParser::MissingArgument,"-i, --input [INPUT1,INPUT2...]\n" if options[:input].nil?
66
+ raise OptionParser::MissingArgument,"-c, --command-line [command line]\n" if options[:cmd].nil?
67
+ raise OptionParser::MissingArgument,"-n, --name [analysis name]\n" if options[:name].nil?
68
+ raise OptionParser::MissingArgument,"-o, --output [output folder]\n" if options[:output].nil?
69
+
70
+ Bio::Grid.run(options)
71
+
72
+
@@ -0,0 +1,64 @@
1
+ # Generated by jeweler
2
+ # DO NOT EDIT THIS FILE DIRECTLY
3
+ # Instead, edit Jeweler::Tasks in Rakefile, and run 'rake gemspec'
4
+ # -*- encoding: utf-8 -*-
5
+
6
+ Gem::Specification.new do |s|
7
+ s.name = "bio-grid"
8
+ s.version = "0.2.0"
9
+
10
+ s.required_rubygems_version = Gem::Requirement.new(">= 0") if s.respond_to? :required_rubygems_version=
11
+ s.authors = ["Francesco Strozzi"]
12
+ s.date = "2012-09-20"
13
+ s.description = "A BioGem to submit jobs on a queue system"
14
+ s.email = "francesco.strozzi@gmail.com"
15
+ s.executables = ["bio-grid"]
16
+ s.extra_rdoc_files = [
17
+ "LICENSE.txt",
18
+ "README.md"
19
+ ]
20
+ s.files = [
21
+ ".document",
22
+ ".rspec",
23
+ "Gemfile",
24
+ "Gemfile.lock",
25
+ "LICENSE.txt",
26
+ "README.md",
27
+ "Rakefile",
28
+ "VERSION",
29
+ "bin/bio-grid",
30
+ "bio-grid.gemspec",
31
+ "lib/bio/grid.rb",
32
+ "lib/bio/grid/job.rb",
33
+ "lib/bioruby-grid.rb",
34
+ "spec/bioruby-grid_spec.rb",
35
+ "spec/spec_helper.rb"
36
+ ]
37
+ s.homepage = "http://github.com/fstrozzi/bioruby-grid"
38
+ s.licenses = ["MIT"]
39
+ s.require_paths = ["lib"]
40
+ s.rubygems_version = "1.8.24"
41
+ s.summary = "A BioGem to submit jobs on a queue system"
42
+
43
+ if s.respond_to? :specification_version then
44
+ s.specification_version = 3
45
+
46
+ if Gem::Version.new(Gem::VERSION) >= Gem::Version.new('1.2.0') then
47
+ s.add_development_dependency(%q<rspec>, ["~> 2.8.0"])
48
+ s.add_development_dependency(%q<rdoc>, ["~> 3.12"])
49
+ s.add_development_dependency(%q<bundler>, ["> 1.0.0"])
50
+ s.add_development_dependency(%q<jeweler>, ["~> 1.8.4"])
51
+ else
52
+ s.add_dependency(%q<rspec>, ["~> 2.8.0"])
53
+ s.add_dependency(%q<rdoc>, ["~> 3.12"])
54
+ s.add_dependency(%q<bundler>, ["> 1.0.0"])
55
+ s.add_dependency(%q<jeweler>, ["~> 1.8.4"])
56
+ end
57
+ else
58
+ s.add_dependency(%q<rspec>, ["~> 2.8.0"])
59
+ s.add_dependency(%q<rdoc>, ["~> 3.12"])
60
+ s.add_dependency(%q<bundler>, ["> 1.0.0"])
61
+ s.add_dependency(%q<jeweler>, ["~> 1.8.4"])
62
+ end
63
+ end
64
+
@@ -0,0 +1,49 @@
1
+ module Bio
2
+
3
+ class Grid
4
+
5
+ attr_accessor :input,:number
6
+ def initialize(input,number)
7
+ @input = input
8
+ @number = number
9
+ end
10
+
11
+ def self.run(options)
12
+ grid = self.new options[:input], options[:number]
13
+ groups = grid.prepare_input_groups
14
+ inputs = groups.keys.sort
15
+ groups[inputs.shift].each_with_index do |input1,index|
16
+
17
+ if options[:cmd]=~/<(\d+),(\d+)(,\d+)*>/
18
+ step = ($3) ? $3.tr(",","").to_i : 1
19
+ range = Range.new($1.to_i,$2.to_i,false).step(step).to_a
20
+ range.each do |value|
21
+ cmd_line = options[:cmd].gsub(/<(\d+),(\d+)(,\d+)*>/,value.to_s)
22
+ job = Bio::Grid::Job.new(options) # inherit global options
23
+ job.options[:parameter_value] = "-param-#{value}"
24
+ job.execute(cmd_line,inputs,input1,groups,index)
25
+ end
26
+ else
27
+ job = Bio::Grid::Job.new(options) # inherit global options
28
+ job.execute(options[:cmd],inputs,input1,groups,index)
29
+ end
30
+
31
+ break if options[:test]
32
+ end
33
+ end
34
+
35
+ def prepare_input_groups
36
+ groups = Hash.new {|h,k| h[k] = [] }
37
+ self.input.each_with_index do |location,index|
38
+ if self.number == "all"
39
+ groups["input"] << Dir.glob(location).sort
40
+ else
41
+ Dir.glob(location).sort.each_slice(self.number.to_i) {|subgroup| groups["input#{index+1}"] << subgroup}
42
+ end
43
+ end
44
+ groups
45
+ end
46
+
47
+ end
48
+
49
+ end
@@ -0,0 +1,78 @@
1
+ module Bio
2
+ class Grid
3
+ class Job
4
+
5
+ attr_accessor :options, :instructions, :job_output, :runner
6
+ def initialize(options)
7
+ @options = options
8
+ self.instructions = ""
9
+ end
10
+
11
+ def set_output_dir
12
+ p "mkdir -p #{self.options[:output]}\n"
13
+ self.instructions << ("mkdir -p #{self.options[:output]}\n")
14
+ end
15
+
16
+ def set_commandline(cmd_line,inputs,input1,groups,index)
17
+ commandline = cmd_line.gsub(/<input1>|<input>/,input1.join(self.options[:sep]))
18
+ inputs.each do |input|
19
+ commandline.gsub!(/<#{input}>/,groups[input][index].join(self.options[:sep]))
20
+ end
21
+ job_output = ""
22
+ if commandline =~/<output>\.(\S+)/
23
+ extension = $1
24
+ job_output = self.options[:output]+"/"+self.options[:name]+"_output_%03d" % (index+1).to_s + "#{self.options[:parameter_value]}"
25
+ commandline.gsub!(/<output>/,job_output)
26
+ job_output << ".#{extension}"
27
+ else
28
+ self.options[:output_folder] = true
29
+ commandline.gsub!(/<output>/,self.options[:output])
30
+ job_output = self.options[:output]
31
+ end
32
+ self.instructions << commandline+"\n"
33
+ self.job_output = job_output
34
+ end
35
+
36
+ def append_options
37
+ if self.options[:copy]
38
+ self.instructions << ("mkdir -p #{self.options[:copy]}\n")
39
+ copy_type = (self.options[:output_folder]) ? "cp -r" : "cp"
40
+ self.instructions << ("#{copy_type} #{self.job_output} #{self.options[:copy]}\n")
41
+ end
42
+
43
+ if self.options[:clean]
44
+ rm_type = (self.options[:output_folder]) ? "rm -fr" : "rm -f"
45
+ self.instructions << ("#{rm_type} #{self.job_output}\n")
46
+ end
47
+ end
48
+
49
+ def write_runner(filename)
50
+ self.runner = filename
51
+ out = File.open(Dir.pwd+"/"+filename,"w")
52
+ out.write(self.instructions+"\n")
53
+ out.close
54
+ p filename
55
+ end
56
+
57
+ def run(filename)
58
+ self.write_runner(filename)
59
+ system("qsub #{self.runner}") unless self.options[:dry]
60
+ end
61
+
62
+ def set_scheduler_options(type)
63
+ self.instructions << "#!/bin/bash\n#PBS -N #{self.options[:name]}\n#PBS -l ncpus=#{self.options[:processes]}\n\n" if type == :pbs
64
+ end
65
+
66
+ def execute(command_line,inputs,input1,groups,index)
67
+ self.set_scheduler_options(:pbs) # set script specific options for the scheduling system
68
+ self.set_output_dir
69
+ self.set_commandline(command_line,inputs,input1,groups,index)
70
+ self.append_options
71
+ job_filename = (self.options[:keep]) ? "job_#{index+1}#{self.options[:parameter_value]}.sh" : "job.sh"
72
+ self.run(job_filename)
73
+ end
74
+
75
+
76
+ end
77
+ end
78
+ end
@@ -0,0 +1,2 @@
1
+ require 'bio/grid'
2
+ require 'bio/grid/job'
@@ -0,0 +1,7 @@
1
+ require File.expand_path(File.dirname(__FILE__) + '/spec_helper')
2
+
3
+ describe "BiorubyGrid" do
4
+ it "fails" do
5
+ fail "hey buddy, you should probably rename this file and start specing for real"
6
+ end
7
+ end
@@ -0,0 +1,12 @@
1
+ $LOAD_PATH.unshift(File.join(File.dirname(__FILE__), '..', 'lib'))
2
+ $LOAD_PATH.unshift(File.dirname(__FILE__))
3
+ require 'rspec'
4
+ require 'bioruby-grid'
5
+
6
+ # Requires supporting files with custom matchers and macros, etc,
7
+ # in ./support/ and its subdirectories.
8
+ Dir["#{File.dirname(__FILE__)}/support/**/*.rb"].each {|f| require f}
9
+
10
+ RSpec.configure do |config|
11
+
12
+ end
metadata ADDED
@@ -0,0 +1,130 @@
1
+ --- !ruby/object:Gem::Specification
2
+ name: bio-grid
3
+ version: !ruby/object:Gem::Version
4
+ version: 0.2.0
5
+ prerelease:
6
+ platform: ruby
7
+ authors:
8
+ - Francesco Strozzi
9
+ autorequire:
10
+ bindir: bin
11
+ cert_chain: []
12
+ date: 2012-09-20 00:00:00.000000000 Z
13
+ dependencies:
14
+ - !ruby/object:Gem::Dependency
15
+ name: rspec
16
+ requirement: !ruby/object:Gem::Requirement
17
+ none: false
18
+ requirements:
19
+ - - ~>
20
+ - !ruby/object:Gem::Version
21
+ version: 2.8.0
22
+ type: :development
23
+ prerelease: false
24
+ version_requirements: !ruby/object:Gem::Requirement
25
+ none: false
26
+ requirements:
27
+ - - ~>
28
+ - !ruby/object:Gem::Version
29
+ version: 2.8.0
30
+ - !ruby/object:Gem::Dependency
31
+ name: rdoc
32
+ requirement: !ruby/object:Gem::Requirement
33
+ none: false
34
+ requirements:
35
+ - - ~>
36
+ - !ruby/object:Gem::Version
37
+ version: '3.12'
38
+ type: :development
39
+ prerelease: false
40
+ version_requirements: !ruby/object:Gem::Requirement
41
+ none: false
42
+ requirements:
43
+ - - ~>
44
+ - !ruby/object:Gem::Version
45
+ version: '3.12'
46
+ - !ruby/object:Gem::Dependency
47
+ name: bundler
48
+ requirement: !ruby/object:Gem::Requirement
49
+ none: false
50
+ requirements:
51
+ - - ! '>'
52
+ - !ruby/object:Gem::Version
53
+ version: 1.0.0
54
+ type: :development
55
+ prerelease: false
56
+ version_requirements: !ruby/object:Gem::Requirement
57
+ none: false
58
+ requirements:
59
+ - - ! '>'
60
+ - !ruby/object:Gem::Version
61
+ version: 1.0.0
62
+ - !ruby/object:Gem::Dependency
63
+ name: jeweler
64
+ requirement: !ruby/object:Gem::Requirement
65
+ none: false
66
+ requirements:
67
+ - - ~>
68
+ - !ruby/object:Gem::Version
69
+ version: 1.8.4
70
+ type: :development
71
+ prerelease: false
72
+ version_requirements: !ruby/object:Gem::Requirement
73
+ none: false
74
+ requirements:
75
+ - - ~>
76
+ - !ruby/object:Gem::Version
77
+ version: 1.8.4
78
+ description: A BioGem to submit jobs on a queue system
79
+ email: francesco.strozzi@gmail.com
80
+ executables:
81
+ - bio-grid
82
+ extensions: []
83
+ extra_rdoc_files:
84
+ - LICENSE.txt
85
+ - README.md
86
+ files:
87
+ - .document
88
+ - .rspec
89
+ - Gemfile
90
+ - Gemfile.lock
91
+ - LICENSE.txt
92
+ - README.md
93
+ - Rakefile
94
+ - VERSION
95
+ - bin/bio-grid
96
+ - bio-grid.gemspec
97
+ - lib/bio/grid.rb
98
+ - lib/bio/grid/job.rb
99
+ - lib/bioruby-grid.rb
100
+ - spec/bioruby-grid_spec.rb
101
+ - spec/spec_helper.rb
102
+ homepage: http://github.com/fstrozzi/bioruby-grid
103
+ licenses:
104
+ - MIT
105
+ post_install_message:
106
+ rdoc_options: []
107
+ require_paths:
108
+ - lib
109
+ required_ruby_version: !ruby/object:Gem::Requirement
110
+ none: false
111
+ requirements:
112
+ - - ! '>='
113
+ - !ruby/object:Gem::Version
114
+ version: '0'
115
+ segments:
116
+ - 0
117
+ hash: -469848872734109697
118
+ required_rubygems_version: !ruby/object:Gem::Requirement
119
+ none: false
120
+ requirements:
121
+ - - ! '>='
122
+ - !ruby/object:Gem::Version
123
+ version: '0'
124
+ requirements: []
125
+ rubyforge_project:
126
+ rubygems_version: 1.8.24
127
+ signing_key:
128
+ specification_version: 3
129
+ summary: A BioGem to submit jobs on a queue system
130
+ test_files: []