bio-grid 0.2.0
Sign up to get free protection for your applications and to get access to all the features.
- data/.document +5 -0
- data/.rspec +1 -0
- data/Gemfile +13 -0
- data/Gemfile.lock +31 -0
- data/LICENSE.txt +20 -0
- data/README.md +140 -0
- data/Rakefile +49 -0
- data/VERSION +1 -0
- data/bin/bio-grid +72 -0
- data/bio-grid.gemspec +64 -0
- data/lib/bio/grid.rb +49 -0
- data/lib/bio/grid/job.rb +78 -0
- data/lib/bioruby-grid.rb +2 -0
- data/spec/bioruby-grid_spec.rb +7 -0
- data/spec/spec_helper.rb +12 -0
- metadata +130 -0
data/.document
ADDED
data/.rspec
ADDED
@@ -0,0 +1 @@
|
|
1
|
+
--color
|
data/Gemfile
ADDED
@@ -0,0 +1,13 @@
|
|
1
|
+
source "http://rubygems.org"
|
2
|
+
# Add dependencies required to use your gem here.
|
3
|
+
# Example:
|
4
|
+
# gem "activesupport", ">= 2.3.5"
|
5
|
+
|
6
|
+
# Add dependencies to develop your gem here.
|
7
|
+
# Include everything needed to run rake, tests, features, etc.
|
8
|
+
group :development do
|
9
|
+
gem "rspec", "~> 2.8.0"
|
10
|
+
gem "rdoc", "~> 3.12"
|
11
|
+
gem "bundler", "> 1.0.0"
|
12
|
+
gem "jeweler", "~> 1.8.4"
|
13
|
+
end
|
data/Gemfile.lock
ADDED
@@ -0,0 +1,31 @@
|
|
1
|
+
GEM
|
2
|
+
remote: http://rubygems.org/
|
3
|
+
specs:
|
4
|
+
diff-lcs (1.1.3)
|
5
|
+
git (1.2.5)
|
6
|
+
jeweler (1.8.4)
|
7
|
+
bundler (~> 1.0)
|
8
|
+
git (>= 1.2.5)
|
9
|
+
rake
|
10
|
+
rdoc
|
11
|
+
json (1.7.5)
|
12
|
+
rake (0.9.2.2)
|
13
|
+
rdoc (3.12)
|
14
|
+
json (~> 1.4)
|
15
|
+
rspec (2.8.0)
|
16
|
+
rspec-core (~> 2.8.0)
|
17
|
+
rspec-expectations (~> 2.8.0)
|
18
|
+
rspec-mocks (~> 2.8.0)
|
19
|
+
rspec-core (2.8.0)
|
20
|
+
rspec-expectations (2.8.0)
|
21
|
+
diff-lcs (~> 1.1.2)
|
22
|
+
rspec-mocks (2.8.0)
|
23
|
+
|
24
|
+
PLATFORMS
|
25
|
+
ruby
|
26
|
+
|
27
|
+
DEPENDENCIES
|
28
|
+
bundler (> 1.0.0)
|
29
|
+
jeweler (~> 1.8.4)
|
30
|
+
rdoc (~> 3.12)
|
31
|
+
rspec (~> 2.8.0)
|
data/LICENSE.txt
ADDED
@@ -0,0 +1,20 @@
|
|
1
|
+
Copyright (c) 2012 Francesco Strozzi
|
2
|
+
|
3
|
+
Permission is hereby granted, free of charge, to any person obtaining
|
4
|
+
a copy of this software and associated documentation files (the
|
5
|
+
"Software"), to deal in the Software without restriction, including
|
6
|
+
without limitation the rights to use, copy, modify, merge, publish,
|
7
|
+
distribute, sublicense, and/or sell copies of the Software, and to
|
8
|
+
permit persons to whom the Software is furnished to do so, subject to
|
9
|
+
the following conditions:
|
10
|
+
|
11
|
+
The above copyright notice and this permission notice shall be
|
12
|
+
included in all copies or substantial portions of the Software.
|
13
|
+
|
14
|
+
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
|
15
|
+
EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
|
16
|
+
MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
|
17
|
+
NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE
|
18
|
+
LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION
|
19
|
+
OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION
|
20
|
+
WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
|
data/README.md
ADDED
@@ -0,0 +1,140 @@
|
|
1
|
+
bioruby-grid
|
2
|
+
============
|
3
|
+
|
4
|
+
Utility to create and distribute jobs on a queue system. It is particularly suited to process BigData (i.e. NGS analyses), helping generating hundreds of different jobs with ease to crunch large datasets.
|
5
|
+
|
6
|
+
Usage
|
7
|
+
=====
|
8
|
+
|
9
|
+
This utility is a command line based tool built around the concept of a template that can be reused to generate tens, hundreds or thousands of different jobs to be sent on a queue system.
|
10
|
+
|
11
|
+
The tool for now supports only PBS queue systems, but can be easily expanded to account also for other queueing systems.
|
12
|
+
|
13
|
+
A typical example
|
14
|
+
-----------------
|
15
|
+
|
16
|
+
Let's say I have a bunch of FastQ files that I want to analyze using my favorite reads mapping tool. These files come from a typical Illumina paired end sequencing and I have 60 files from the read 1 and another 60 files from the read 2. Given that I have a distributed system I want to spread the alignments on the cluster (or grid), to speed up the analysis as much as possible.
|
17
|
+
|
18
|
+
Instead of having to manually create a number of running scripts or rewrite for every analysis a new script to do this work, BioGrid can help you saving time handling all of this.
|
19
|
+
|
20
|
+
```shell
|
21
|
+
bio-grid -i "/data/Project_X/Sample_Y/*_R1_*.fastq.gz","/data/Project_X/Sample_Y/*_R2_*.fastq.gz" -n bowtie_mapping -c "/software/bowtie2 -x /genomes/genome_index -p 8 -1 <input1> -2 <input2> > <output>.sam" -o /data/Project_X/Sample_Y_mapping -s 1 -p 8
|
22
|
+
```
|
23
|
+
|
24
|
+
What is happening here is the following:
|
25
|
+
|
26
|
+
* the ```-i``` options specifies the input files or, as in this case, the location where to find input files based on a typical wildcard expression. You can actually specify as many input files/locations as you need using a comma separated list.
|
27
|
+
* the ```-n``` specify the job name
|
28
|
+
* the ```-c``` is the command line to be executed on the cluster / grid system. What BioGrid does is to fill in the ```<input1>```,```<input2>``` and ```<output>``` placeholders with the corresponding parameters passed on the command line. This is done for each input file (or each group of input files) and BioGrid will check if the ```<output>``` placeholder has an extension (like .sam, .out etc.) and will generate a unique output file name for each job. IMPORTANT: If no extension is specified for the ```<output>``` placeholder, BioGrid will assume the job will generate more than one output files and that those files will be saved into the folder specified by the "-o" option. Therefore it will manage the output as a whole directory, copying and/or removing the entire folder if "-r" and "-e" options are present (check the [Other options](https://github.com/fstrozzi/bioruby-grid#other-options) section to see what these options are expected to do).
|
29
|
+
|
30
|
+
|
31
|
+
* the ```-o``` set the location where output files for each job will be saved. Only provide the folder where you want to save the output file(s), BioGrid will take care of generating a unique file name for the output, if needed.
|
32
|
+
* the ```-s``` is a key parameter to specify the granularity of the jobs, setting the number of input files (or group of files, when more than one input placeholder is present in the command line) to be used for each job. So, going back to the FastQ example, if -s 1 is specified, each job will be run with exactly one FastQ R1 file and one FastQ R2 file. This gives you a great power in deciding how to split the entire dataset analysis across multiple computing nodes.
|
33
|
+
* the ```-p``` parameter indicates how many processes we want to use for each job. This number needs to match with the actual number of threads / processes that our command or tool will use for the analysis.
|
34
|
+
|
35
|
+
All of this is just turned into a submission script that will look like this:
|
36
|
+
|
37
|
+
```shell
|
38
|
+
#!/bin/bash
|
39
|
+
#PBS -N bowtie_mapping
|
40
|
+
#PBS -l ncpus=8
|
41
|
+
|
42
|
+
mkdir -p /data/Project_X/Sample_Y_mapping
|
43
|
+
/software/bowtie2 -x /genomes/genome_index -p 8 -1 /data/Project_X/Sample_Y/Sample_Y_L001_R1_001.fastq.gz -2 Sample_Y_L001_R2_001.fastq.gz > /data/Project_X/Sample_Y_mapping/bowtie_mapping-output_001.sam
|
44
|
+
```
|
45
|
+
|
46
|
+
and this will be repeated for every input file, according to the -s parameter. So, in this case given that we have 2 input files for each command line and that we had 60 R1 and 60 R2 FastQ files and we have specified "-s 1", 60 different jobs will be created and submitted, each with a specific read pair to be processed by Bowtie.
|
47
|
+
|
48
|
+
Other options
|
49
|
+
-------------
|
50
|
+
|
51
|
+
With BioGrid you can specify many different tasks for the job to execute, for example:
|
52
|
+
|
53
|
+
* ```-t``` to execute only a single job, which is useful to test parameters
|
54
|
+
* ```-r``` to specify a different location from the one used in ```-o```. This folder will be used to copy job outputs once terminated
|
55
|
+
* ```-e``` to erease output files/folders specified by ```-o``` once a job is completed (useful in conjuction with ```-r``` to delete local data on a computing node)
|
56
|
+
* ```-d``` for a dry run, to create submissions scripts without sending them in the queue system
|
57
|
+
|
58
|
+
The following BioGrid command line:
|
59
|
+
|
60
|
+
```shell
|
61
|
+
bio-grid -i "/data/Project_X/Sample_Y/*_R1_*.fastq.gz","/data/Project_X/Sample_Y/*_R2_*.fastq.gz" -n bowtie_mapping -c "/software/bowtie2 -x /genomes/genome_index -p 8 -1 <input1> -2 <input2> > <output>.sam" -o /data/Project_X/Sample_Y_mapping -s 1 -p 8 -r /results/Sample_Y_mapping -e
|
62
|
+
```
|
63
|
+
|
64
|
+
will be turned into this submission script:
|
65
|
+
|
66
|
+
```shell
|
67
|
+
#!/bin/bash
|
68
|
+
#PBS -N bowtie_mapping
|
69
|
+
#PBS -l ncpus=8
|
70
|
+
|
71
|
+
mkdir -p /data/Project_X/Sample_Y_mapping # output dir
|
72
|
+
/software/bowtie2 -x /genomes/genome_index -p 8 -1 /data/Project_X/Sample_Y/Sample_Y_L001_R1_001.fastq.gz -2 Sample_Y_L001_R2_001.fastq.gz > /data/Project_X/Sample_Y_mapping/bowtie_mapping-output_001.sam # command line
|
73
|
+
mkdir -p /results/Sample_Y_mapping # final location where to copy job output once terminated
|
74
|
+
cp /data/Project_X/Sample_Y_mapping/bowtie_mapping-output_001.sam /results/Sample_Y_mapping # copy the outputs to the final location
|
75
|
+
rm -f /data/Project_X/Sample_Y_mapping/bowtie_mapping-output_001.sam # deleting output data
|
76
|
+
```
|
77
|
+
|
78
|
+
For a complete list of current BioGrid parameters, type "bio-grid -h":
|
79
|
+
|
80
|
+
```
|
81
|
+
-n, --name NAME Analysis name
|
82
|
+
-s, --split-number NUMBER Number of input files (or group of files) to use per job. If all the files in a location need to be used for a single job, just specify 'all'
|
83
|
+
-p, --processes PROCESSES Number of processes per job
|
84
|
+
-c, --command-line COMMANDLINE Command line to be executed
|
85
|
+
-o, --output OUTPUT Output folder
|
86
|
+
-r, --copy-to LOCATION Copy the output once a job is terminated
|
87
|
+
-e, --erease-output Delete job output data when completed (useful to delete output temporary files on a computing node)
|
88
|
+
-d, --dry Dry run. Just write the job scripts without sending them in queue (for debugging or testing)
|
89
|
+
-t, --test Start the mapping only with the first group of reads (e.g. for testing parameters)
|
90
|
+
-i, --input INPUT1,INPUT2... Location where to find input files (accepts wildcards). You can specify more than one input location, just provide a comma separated list
|
91
|
+
--sep SEPARATOR Input file separator [Default: , ]
|
92
|
+
--keep-scripts Keep all the running scripts created for all the jobs
|
93
|
+
-h, --help Display this screen
|
94
|
+
```
|
95
|
+
|
96
|
+
Advanced stuff
|
97
|
+
==============
|
98
|
+
|
99
|
+
Ok let's unleash the potential of BioGrid.
|
100
|
+
By putting together an automatic system to generate and submit jobs on a queue systems and a command line template approach, we can do some interesting things.
|
101
|
+
|
102
|
+
Parameters sampling and testing
|
103
|
+
-------------------------------
|
104
|
+
|
105
|
+
The tipical scenario is when I have to run a tool on a new dataset and I would like to test different parameters to asses which are the better ones for my analysis.
|
106
|
+
This can be easily done with BioGrid. For example:
|
107
|
+
|
108
|
+
```shell
|
109
|
+
bio-grid -i "/data/Project_X/Sample_Y/*_R1_*.fastq.gz","/data/Project_X/Sample_Y/*_R2_*.fastq.gz" -n bowtie_mapping -c "/software/bowtie2 -x /genomes/genome_index -p 8 -L <22,32,2> -1 <input1> -2 <input2> > <output>.sam" -o /data/Project_X/Sample_Y_mapping -s 1 -p 8 -r /results/Sample_Y_mapping -e -t
|
110
|
+
```
|
111
|
+
|
112
|
+
The key points here are the ```-L <22,32,2>``` in the command line template and the ```-t``` options of BioGrid. The first is a way to tell BioGrid to generate a number of similar jobs, each one with a different value for the parameter ```-L```. The values are decided based on the information passsed within the ```< >```:
|
113
|
+
|
114
|
+
* the first number is the first value that the parameter will take
|
115
|
+
* the second number is the last value that the parameter will take
|
116
|
+
* the third number is the increment to generate the range of values in between
|
117
|
+
|
118
|
+
So in this case, the ```-L``` parameter will take 6 different values: 22, 24, 26, 28, 30 and 32.
|
119
|
+
|
120
|
+
Last but not least, the ```-t``` option is essential so that only a single job per input file (or group of files) will be executed. Sampling parameters values is a typical combinatorial approach and this option avoids generating hundreds of different jobs only to sample a parameter. Coming back to the initial example, if I have 60 pairs of FastQ files, without the ```-t``` option, the job number will be 60x6 = 360, which is just crazy when you only want to test different parameter values.
|
121
|
+
|
122
|
+
So far, BioGrid does not support sampling more than one parameter at the same time.
|
123
|
+
|
124
|
+
Contributing to bioruby-grid
|
125
|
+
============================
|
126
|
+
|
127
|
+
* Check out the latest master to make sure the feature hasn't been implemented or the bug hasn't been fixed yet.
|
128
|
+
* Check out the issue tracker to make sure someone already hasn't requested it and/or contributed it.
|
129
|
+
* Fork the project.
|
130
|
+
* Start a feature/bugfix branch.
|
131
|
+
* Commit and push until you are happy with your contribution.
|
132
|
+
* Make sure to add tests for it. This is important so I don't break it in a future version unintentionally.
|
133
|
+
* Please try not to mess with the Rakefile, version, or history. If you want to have your own version, or is otherwise necessary, that is fine, but please isolate to its own commit so I can cherry-pick around it.
|
134
|
+
|
135
|
+
Copyright
|
136
|
+
=========
|
137
|
+
|
138
|
+
Copyright (c) 2012 Francesco Strozzi. See LICENSE.txt for
|
139
|
+
further details.
|
140
|
+
|
data/Rakefile
ADDED
@@ -0,0 +1,49 @@
|
|
1
|
+
# encoding: utf-8
|
2
|
+
|
3
|
+
require 'rubygems'
|
4
|
+
require 'bundler'
|
5
|
+
begin
|
6
|
+
Bundler.setup(:default, :development)
|
7
|
+
rescue Bundler::BundlerError => e
|
8
|
+
$stderr.puts e.message
|
9
|
+
$stderr.puts "Run `bundle install` to install missing gems"
|
10
|
+
exit e.status_code
|
11
|
+
end
|
12
|
+
require 'rake'
|
13
|
+
|
14
|
+
require 'jeweler'
|
15
|
+
Jeweler::Tasks.new do |gem|
|
16
|
+
# gem is a Gem::Specification... see http://docs.rubygems.org/read/chapter/20 for more options
|
17
|
+
gem.name = "bio-grid"
|
18
|
+
gem.homepage = "http://github.com/fstrozzi/bioruby-grid"
|
19
|
+
gem.license = "MIT"
|
20
|
+
gem.summary = %Q{A BioGem to submit jobs on a queue system}
|
21
|
+
gem.description = %{A BioGem to submit jobs on a queue system}
|
22
|
+
gem.email = "francesco.strozzi@gmail.com"
|
23
|
+
gem.authors = ["Francesco Strozzi"]
|
24
|
+
# dependencies defined in Gemfile
|
25
|
+
end
|
26
|
+
Jeweler::RubygemsDotOrgTasks.new
|
27
|
+
|
28
|
+
require 'rspec/core'
|
29
|
+
require 'rspec/core/rake_task'
|
30
|
+
RSpec::Core::RakeTask.new(:spec) do |spec|
|
31
|
+
spec.pattern = FileList['spec/**/*_spec.rb']
|
32
|
+
end
|
33
|
+
|
34
|
+
RSpec::Core::RakeTask.new(:rcov) do |spec|
|
35
|
+
spec.pattern = 'spec/**/*_spec.rb'
|
36
|
+
spec.rcov = true
|
37
|
+
end
|
38
|
+
|
39
|
+
task :default => :spec
|
40
|
+
|
41
|
+
require 'rdoc/task'
|
42
|
+
Rake::RDocTask.new do |rdoc|
|
43
|
+
version = File.exist?('VERSION') ? File.read('VERSION') : ""
|
44
|
+
|
45
|
+
rdoc.rdoc_dir = 'rdoc'
|
46
|
+
rdoc.title = "bioruby-grid #{version}"
|
47
|
+
rdoc.rdoc_files.include('README*')
|
48
|
+
rdoc.rdoc_files.include('lib/**/*.rb')
|
49
|
+
end
|
data/VERSION
ADDED
@@ -0,0 +1 @@
|
|
1
|
+
0.2.0
|
data/bin/bio-grid
ADDED
@@ -0,0 +1,72 @@
|
|
1
|
+
#!/usr/bin/env ruby
|
2
|
+
|
3
|
+
require 'optparse'
|
4
|
+
$:<< File.expand_path(File.join(File.dirname(File.dirname __FILE__),"lib"))
|
5
|
+
require 'bioruby-grid'
|
6
|
+
|
7
|
+
options = {}
|
8
|
+
options[:sep] = ","
|
9
|
+
|
10
|
+
optparse = OptionParser.new do |opts|
|
11
|
+
opts.banner = "\nCopyright(c) 2012 Francesco Strozzi\nUtility to create and distribute jobs on a queue system.\n\nE.g. #{$0} -i \"/Project_X/Sample_Y/*_R1_*.fastq\",\"/Project_X/Sample_Y/*_R2_*.fastq\" --name bowtie2 -s 10 -p 12 --command-line \"/software/bowtie2 -x /genomes/bowtie2_index/genome_index -1 <input1> -2 <input2> -p 12 > <output>.sam\" --output /tmp/Sample_Y_mapping --copy-to /archive/Sample_Y_mapping --erease-output\n\n\n"
|
12
|
+
|
13
|
+
opts.on("-n","--name NAME","Analysis name") do |name|
|
14
|
+
options[:name] = name
|
15
|
+
end
|
16
|
+
|
17
|
+
opts.on("-s","--split-number NUMBER","Number of input files (or group of files) to use per job. If all the files in a location need to be used for a single job, just specify 'all'") do |number|
|
18
|
+
options[:number] = number
|
19
|
+
end
|
20
|
+
|
21
|
+
opts.on("-p","--processes PROCESSES","Number of processes per job") do |processes|
|
22
|
+
options[:processes] = processes
|
23
|
+
end
|
24
|
+
|
25
|
+
opts.on("-c","--command-line COMMANDLINE","Command line to be executed") do |cmd|
|
26
|
+
options[:cmd] = cmd
|
27
|
+
end
|
28
|
+
|
29
|
+
opts.on("-o","--output OUTPUT","Output folder") do |out|
|
30
|
+
options[:output] = out
|
31
|
+
end
|
32
|
+
|
33
|
+
opts.on("-r","--copy-to LOCATION","Copy the output once a job is terminated") do |location|
|
34
|
+
options[:copy] = location
|
35
|
+
end
|
36
|
+
|
37
|
+
opts.on("-e","--erease-output","Delete job output data when completed (useful to delete output temporary files on a computing node)") do |clean|
|
38
|
+
options[:clean] = true
|
39
|
+
end
|
40
|
+
|
41
|
+
opts.on("-d","--dry","Dry run. Just write the job scripts without sending them in queue (for debugging or testing)") {options[:dry] = true}
|
42
|
+
|
43
|
+
opts.on("-t","--test","Start the mapping only with the first group of reads (e.g. for testing parameters)") do |test|
|
44
|
+
options[:test] = true
|
45
|
+
end
|
46
|
+
|
47
|
+
opts.on("-i","--input INPUT1,INPUT2...",Array,"Location where to find input files (accepts wildcards). You can specify more than one input location, just provide a comma separated list") do |input|
|
48
|
+
options[:input] = input
|
49
|
+
end
|
50
|
+
|
51
|
+
opts.on("--sep SEPARATOR","Input file separator [Default: , ]") do |sep|
|
52
|
+
options[:sep] = sep
|
53
|
+
end
|
54
|
+
|
55
|
+
opts.on("--keep-scripts","Keep all the running scripts created for all the jobs") {options[:keep] = true}
|
56
|
+
|
57
|
+
opts.on("-h","--help","Display this screen") do
|
58
|
+
puts opts
|
59
|
+
print "\n"
|
60
|
+
end
|
61
|
+
end
|
62
|
+
|
63
|
+
optparse.parse!
|
64
|
+
|
65
|
+
raise OptionParser::MissingArgument,"-i, --input [INPUT1,INPUT2...]\n" if options[:input].nil?
|
66
|
+
raise OptionParser::MissingArgument,"-c, --command-line [command line]\n" if options[:cmd].nil?
|
67
|
+
raise OptionParser::MissingArgument,"-n, --name [analysis name]\n" if options[:name].nil?
|
68
|
+
raise OptionParser::MissingArgument,"-o, --output [output folder]\n" if options[:output].nil?
|
69
|
+
|
70
|
+
Bio::Grid.run(options)
|
71
|
+
|
72
|
+
|
data/bio-grid.gemspec
ADDED
@@ -0,0 +1,64 @@
|
|
1
|
+
# Generated by jeweler
|
2
|
+
# DO NOT EDIT THIS FILE DIRECTLY
|
3
|
+
# Instead, edit Jeweler::Tasks in Rakefile, and run 'rake gemspec'
|
4
|
+
# -*- encoding: utf-8 -*-
|
5
|
+
|
6
|
+
Gem::Specification.new do |s|
|
7
|
+
s.name = "bio-grid"
|
8
|
+
s.version = "0.2.0"
|
9
|
+
|
10
|
+
s.required_rubygems_version = Gem::Requirement.new(">= 0") if s.respond_to? :required_rubygems_version=
|
11
|
+
s.authors = ["Francesco Strozzi"]
|
12
|
+
s.date = "2012-09-20"
|
13
|
+
s.description = "A BioGem to submit jobs on a queue system"
|
14
|
+
s.email = "francesco.strozzi@gmail.com"
|
15
|
+
s.executables = ["bio-grid"]
|
16
|
+
s.extra_rdoc_files = [
|
17
|
+
"LICENSE.txt",
|
18
|
+
"README.md"
|
19
|
+
]
|
20
|
+
s.files = [
|
21
|
+
".document",
|
22
|
+
".rspec",
|
23
|
+
"Gemfile",
|
24
|
+
"Gemfile.lock",
|
25
|
+
"LICENSE.txt",
|
26
|
+
"README.md",
|
27
|
+
"Rakefile",
|
28
|
+
"VERSION",
|
29
|
+
"bin/bio-grid",
|
30
|
+
"bio-grid.gemspec",
|
31
|
+
"lib/bio/grid.rb",
|
32
|
+
"lib/bio/grid/job.rb",
|
33
|
+
"lib/bioruby-grid.rb",
|
34
|
+
"spec/bioruby-grid_spec.rb",
|
35
|
+
"spec/spec_helper.rb"
|
36
|
+
]
|
37
|
+
s.homepage = "http://github.com/fstrozzi/bioruby-grid"
|
38
|
+
s.licenses = ["MIT"]
|
39
|
+
s.require_paths = ["lib"]
|
40
|
+
s.rubygems_version = "1.8.24"
|
41
|
+
s.summary = "A BioGem to submit jobs on a queue system"
|
42
|
+
|
43
|
+
if s.respond_to? :specification_version then
|
44
|
+
s.specification_version = 3
|
45
|
+
|
46
|
+
if Gem::Version.new(Gem::VERSION) >= Gem::Version.new('1.2.0') then
|
47
|
+
s.add_development_dependency(%q<rspec>, ["~> 2.8.0"])
|
48
|
+
s.add_development_dependency(%q<rdoc>, ["~> 3.12"])
|
49
|
+
s.add_development_dependency(%q<bundler>, ["> 1.0.0"])
|
50
|
+
s.add_development_dependency(%q<jeweler>, ["~> 1.8.4"])
|
51
|
+
else
|
52
|
+
s.add_dependency(%q<rspec>, ["~> 2.8.0"])
|
53
|
+
s.add_dependency(%q<rdoc>, ["~> 3.12"])
|
54
|
+
s.add_dependency(%q<bundler>, ["> 1.0.0"])
|
55
|
+
s.add_dependency(%q<jeweler>, ["~> 1.8.4"])
|
56
|
+
end
|
57
|
+
else
|
58
|
+
s.add_dependency(%q<rspec>, ["~> 2.8.0"])
|
59
|
+
s.add_dependency(%q<rdoc>, ["~> 3.12"])
|
60
|
+
s.add_dependency(%q<bundler>, ["> 1.0.0"])
|
61
|
+
s.add_dependency(%q<jeweler>, ["~> 1.8.4"])
|
62
|
+
end
|
63
|
+
end
|
64
|
+
|
data/lib/bio/grid.rb
ADDED
@@ -0,0 +1,49 @@
|
|
1
|
+
module Bio
|
2
|
+
|
3
|
+
class Grid
|
4
|
+
|
5
|
+
attr_accessor :input,:number
|
6
|
+
def initialize(input,number)
|
7
|
+
@input = input
|
8
|
+
@number = number
|
9
|
+
end
|
10
|
+
|
11
|
+
def self.run(options)
|
12
|
+
grid = self.new options[:input], options[:number]
|
13
|
+
groups = grid.prepare_input_groups
|
14
|
+
inputs = groups.keys.sort
|
15
|
+
groups[inputs.shift].each_with_index do |input1,index|
|
16
|
+
|
17
|
+
if options[:cmd]=~/<(\d+),(\d+)(,\d+)*>/
|
18
|
+
step = ($3) ? $3.tr(",","").to_i : 1
|
19
|
+
range = Range.new($1.to_i,$2.to_i,false).step(step).to_a
|
20
|
+
range.each do |value|
|
21
|
+
cmd_line = options[:cmd].gsub(/<(\d+),(\d+)(,\d+)*>/,value.to_s)
|
22
|
+
job = Bio::Grid::Job.new(options) # inherit global options
|
23
|
+
job.options[:parameter_value] = "-param-#{value}"
|
24
|
+
job.execute(cmd_line,inputs,input1,groups,index)
|
25
|
+
end
|
26
|
+
else
|
27
|
+
job = Bio::Grid::Job.new(options) # inherit global options
|
28
|
+
job.execute(options[:cmd],inputs,input1,groups,index)
|
29
|
+
end
|
30
|
+
|
31
|
+
break if options[:test]
|
32
|
+
end
|
33
|
+
end
|
34
|
+
|
35
|
+
def prepare_input_groups
|
36
|
+
groups = Hash.new {|h,k| h[k] = [] }
|
37
|
+
self.input.each_with_index do |location,index|
|
38
|
+
if self.number == "all"
|
39
|
+
groups["input"] << Dir.glob(location).sort
|
40
|
+
else
|
41
|
+
Dir.glob(location).sort.each_slice(self.number.to_i) {|subgroup| groups["input#{index+1}"] << subgroup}
|
42
|
+
end
|
43
|
+
end
|
44
|
+
groups
|
45
|
+
end
|
46
|
+
|
47
|
+
end
|
48
|
+
|
49
|
+
end
|
data/lib/bio/grid/job.rb
ADDED
@@ -0,0 +1,78 @@
|
|
1
|
+
module Bio
|
2
|
+
class Grid
|
3
|
+
class Job
|
4
|
+
|
5
|
+
attr_accessor :options, :instructions, :job_output, :runner
|
6
|
+
def initialize(options)
|
7
|
+
@options = options
|
8
|
+
self.instructions = ""
|
9
|
+
end
|
10
|
+
|
11
|
+
def set_output_dir
|
12
|
+
p "mkdir -p #{self.options[:output]}\n"
|
13
|
+
self.instructions << ("mkdir -p #{self.options[:output]}\n")
|
14
|
+
end
|
15
|
+
|
16
|
+
def set_commandline(cmd_line,inputs,input1,groups,index)
|
17
|
+
commandline = cmd_line.gsub(/<input1>|<input>/,input1.join(self.options[:sep]))
|
18
|
+
inputs.each do |input|
|
19
|
+
commandline.gsub!(/<#{input}>/,groups[input][index].join(self.options[:sep]))
|
20
|
+
end
|
21
|
+
job_output = ""
|
22
|
+
if commandline =~/<output>\.(\S+)/
|
23
|
+
extension = $1
|
24
|
+
job_output = self.options[:output]+"/"+self.options[:name]+"_output_%03d" % (index+1).to_s + "#{self.options[:parameter_value]}"
|
25
|
+
commandline.gsub!(/<output>/,job_output)
|
26
|
+
job_output << ".#{extension}"
|
27
|
+
else
|
28
|
+
self.options[:output_folder] = true
|
29
|
+
commandline.gsub!(/<output>/,self.options[:output])
|
30
|
+
job_output = self.options[:output]
|
31
|
+
end
|
32
|
+
self.instructions << commandline+"\n"
|
33
|
+
self.job_output = job_output
|
34
|
+
end
|
35
|
+
|
36
|
+
def append_options
|
37
|
+
if self.options[:copy]
|
38
|
+
self.instructions << ("mkdir -p #{self.options[:copy]}\n")
|
39
|
+
copy_type = (self.options[:output_folder]) ? "cp -r" : "cp"
|
40
|
+
self.instructions << ("#{copy_type} #{self.job_output} #{self.options[:copy]}\n")
|
41
|
+
end
|
42
|
+
|
43
|
+
if self.options[:clean]
|
44
|
+
rm_type = (self.options[:output_folder]) ? "rm -fr" : "rm -f"
|
45
|
+
self.instructions << ("#{rm_type} #{self.job_output}\n")
|
46
|
+
end
|
47
|
+
end
|
48
|
+
|
49
|
+
def write_runner(filename)
|
50
|
+
self.runner = filename
|
51
|
+
out = File.open(Dir.pwd+"/"+filename,"w")
|
52
|
+
out.write(self.instructions+"\n")
|
53
|
+
out.close
|
54
|
+
p filename
|
55
|
+
end
|
56
|
+
|
57
|
+
def run(filename)
|
58
|
+
self.write_runner(filename)
|
59
|
+
system("qsub #{self.runner}") unless self.options[:dry]
|
60
|
+
end
|
61
|
+
|
62
|
+
def set_scheduler_options(type)
|
63
|
+
self.instructions << "#!/bin/bash\n#PBS -N #{self.options[:name]}\n#PBS -l ncpus=#{self.options[:processes]}\n\n" if type == :pbs
|
64
|
+
end
|
65
|
+
|
66
|
+
def execute(command_line,inputs,input1,groups,index)
|
67
|
+
self.set_scheduler_options(:pbs) # set script specific options for the scheduling system
|
68
|
+
self.set_output_dir
|
69
|
+
self.set_commandline(command_line,inputs,input1,groups,index)
|
70
|
+
self.append_options
|
71
|
+
job_filename = (self.options[:keep]) ? "job_#{index+1}#{self.options[:parameter_value]}.sh" : "job.sh"
|
72
|
+
self.run(job_filename)
|
73
|
+
end
|
74
|
+
|
75
|
+
|
76
|
+
end
|
77
|
+
end
|
78
|
+
end
|
data/lib/bioruby-grid.rb
ADDED
data/spec/spec_helper.rb
ADDED
@@ -0,0 +1,12 @@
|
|
1
|
+
$LOAD_PATH.unshift(File.join(File.dirname(__FILE__), '..', 'lib'))
|
2
|
+
$LOAD_PATH.unshift(File.dirname(__FILE__))
|
3
|
+
require 'rspec'
|
4
|
+
require 'bioruby-grid'
|
5
|
+
|
6
|
+
# Requires supporting files with custom matchers and macros, etc,
|
7
|
+
# in ./support/ and its subdirectories.
|
8
|
+
Dir["#{File.dirname(__FILE__)}/support/**/*.rb"].each {|f| require f}
|
9
|
+
|
10
|
+
RSpec.configure do |config|
|
11
|
+
|
12
|
+
end
|
metadata
ADDED
@@ -0,0 +1,130 @@
|
|
1
|
+
--- !ruby/object:Gem::Specification
|
2
|
+
name: bio-grid
|
3
|
+
version: !ruby/object:Gem::Version
|
4
|
+
version: 0.2.0
|
5
|
+
prerelease:
|
6
|
+
platform: ruby
|
7
|
+
authors:
|
8
|
+
- Francesco Strozzi
|
9
|
+
autorequire:
|
10
|
+
bindir: bin
|
11
|
+
cert_chain: []
|
12
|
+
date: 2012-09-20 00:00:00.000000000 Z
|
13
|
+
dependencies:
|
14
|
+
- !ruby/object:Gem::Dependency
|
15
|
+
name: rspec
|
16
|
+
requirement: !ruby/object:Gem::Requirement
|
17
|
+
none: false
|
18
|
+
requirements:
|
19
|
+
- - ~>
|
20
|
+
- !ruby/object:Gem::Version
|
21
|
+
version: 2.8.0
|
22
|
+
type: :development
|
23
|
+
prerelease: false
|
24
|
+
version_requirements: !ruby/object:Gem::Requirement
|
25
|
+
none: false
|
26
|
+
requirements:
|
27
|
+
- - ~>
|
28
|
+
- !ruby/object:Gem::Version
|
29
|
+
version: 2.8.0
|
30
|
+
- !ruby/object:Gem::Dependency
|
31
|
+
name: rdoc
|
32
|
+
requirement: !ruby/object:Gem::Requirement
|
33
|
+
none: false
|
34
|
+
requirements:
|
35
|
+
- - ~>
|
36
|
+
- !ruby/object:Gem::Version
|
37
|
+
version: '3.12'
|
38
|
+
type: :development
|
39
|
+
prerelease: false
|
40
|
+
version_requirements: !ruby/object:Gem::Requirement
|
41
|
+
none: false
|
42
|
+
requirements:
|
43
|
+
- - ~>
|
44
|
+
- !ruby/object:Gem::Version
|
45
|
+
version: '3.12'
|
46
|
+
- !ruby/object:Gem::Dependency
|
47
|
+
name: bundler
|
48
|
+
requirement: !ruby/object:Gem::Requirement
|
49
|
+
none: false
|
50
|
+
requirements:
|
51
|
+
- - ! '>'
|
52
|
+
- !ruby/object:Gem::Version
|
53
|
+
version: 1.0.0
|
54
|
+
type: :development
|
55
|
+
prerelease: false
|
56
|
+
version_requirements: !ruby/object:Gem::Requirement
|
57
|
+
none: false
|
58
|
+
requirements:
|
59
|
+
- - ! '>'
|
60
|
+
- !ruby/object:Gem::Version
|
61
|
+
version: 1.0.0
|
62
|
+
- !ruby/object:Gem::Dependency
|
63
|
+
name: jeweler
|
64
|
+
requirement: !ruby/object:Gem::Requirement
|
65
|
+
none: false
|
66
|
+
requirements:
|
67
|
+
- - ~>
|
68
|
+
- !ruby/object:Gem::Version
|
69
|
+
version: 1.8.4
|
70
|
+
type: :development
|
71
|
+
prerelease: false
|
72
|
+
version_requirements: !ruby/object:Gem::Requirement
|
73
|
+
none: false
|
74
|
+
requirements:
|
75
|
+
- - ~>
|
76
|
+
- !ruby/object:Gem::Version
|
77
|
+
version: 1.8.4
|
78
|
+
description: A BioGem to submit jobs on a queue system
|
79
|
+
email: francesco.strozzi@gmail.com
|
80
|
+
executables:
|
81
|
+
- bio-grid
|
82
|
+
extensions: []
|
83
|
+
extra_rdoc_files:
|
84
|
+
- LICENSE.txt
|
85
|
+
- README.md
|
86
|
+
files:
|
87
|
+
- .document
|
88
|
+
- .rspec
|
89
|
+
- Gemfile
|
90
|
+
- Gemfile.lock
|
91
|
+
- LICENSE.txt
|
92
|
+
- README.md
|
93
|
+
- Rakefile
|
94
|
+
- VERSION
|
95
|
+
- bin/bio-grid
|
96
|
+
- bio-grid.gemspec
|
97
|
+
- lib/bio/grid.rb
|
98
|
+
- lib/bio/grid/job.rb
|
99
|
+
- lib/bioruby-grid.rb
|
100
|
+
- spec/bioruby-grid_spec.rb
|
101
|
+
- spec/spec_helper.rb
|
102
|
+
homepage: http://github.com/fstrozzi/bioruby-grid
|
103
|
+
licenses:
|
104
|
+
- MIT
|
105
|
+
post_install_message:
|
106
|
+
rdoc_options: []
|
107
|
+
require_paths:
|
108
|
+
- lib
|
109
|
+
required_ruby_version: !ruby/object:Gem::Requirement
|
110
|
+
none: false
|
111
|
+
requirements:
|
112
|
+
- - ! '>='
|
113
|
+
- !ruby/object:Gem::Version
|
114
|
+
version: '0'
|
115
|
+
segments:
|
116
|
+
- 0
|
117
|
+
hash: -469848872734109697
|
118
|
+
required_rubygems_version: !ruby/object:Gem::Requirement
|
119
|
+
none: false
|
120
|
+
requirements:
|
121
|
+
- - ! '>='
|
122
|
+
- !ruby/object:Gem::Version
|
123
|
+
version: '0'
|
124
|
+
requirements: []
|
125
|
+
rubyforge_project:
|
126
|
+
rubygems_version: 1.8.24
|
127
|
+
signing_key:
|
128
|
+
specification_version: 3
|
129
|
+
summary: A BioGem to submit jobs on a queue system
|
130
|
+
test_files: []
|