bio-grid 0.2.7 → 0.3.0
Sign up to get free protection for your applications and to get access to all the features.
- data/README.md +33 -4
- data/VERSION +1 -1
- data/bio-grid.gemspec +1 -1
- data/lib/bio/grid/job.rb +1 -3
- metadata +2 -2
data/README.md
CHANGED
@@ -25,10 +25,8 @@ What is happening here is the following:
|
|
25
25
|
|
26
26
|
* the ```-i``` options specifies the input files or, as in this case, the location where to find input files based on a typical wildcard expression. You can actually specify as many input files/locations as you need using a comma separated list.
|
27
27
|
* the ```-n``` specify the job name
|
28
|
-
* the ```-c``` is the command line to be executed on the cluster / grid system. What BioGrid does is to fill in the ```<input1>```,```<input2>``` and ```<output>``` placeholders with the corresponding parameters passed on the command line. This is done for each input file (or each group of input files) and BioGrid will check if the ```<output>``` placeholder has an extension (like .sam, .out etc.) and will generate a unique output file name for each job.
|
29
|
-
|
30
|
-
|
31
|
-
* the ```-o``` set the location where output files for each job will be saved. Only provide the folder where you want to save the output file(s), BioGrid will take care of generating a unique file name for the output, if needed.
|
28
|
+
* the ```-c``` is the command line to be executed on the cluster / grid system. What BioGrid does is to fill in the ```<input1>```,```<input2>``` and ```<output>``` placeholders with the corresponding parameters passed on the command line. This is done for each input file (or each group of input files) and BioGrid will check if the ```<output>``` placeholder has an extension (like .sam, .out etc.) and will generate a unique output file name for each job.
|
29
|
+
* the ```-o``` set the location where output files for each job will be saved. Only provide the folder where you want to save the output file(s), BioGrid will take care of generating a unique file name for the output, if needed. Check the [Output management](https://github.com/fstrozzi/bioruby-grid#output-management) for more details.
|
32
30
|
* the ```-s``` is a key parameter to specify the granularity of the jobs, setting the number of input files (or group of files, when more than one input placeholder is present in the command line) to be used for each job. So, going back to the FastQ example, if -s 1 is specified, each job will be run with exactly one FastQ R1 file and one FastQ R2 file. This gives you a great power in deciding how to split the entire dataset analysis across multiple computing nodes.
|
33
31
|
* the ```-p``` parameter indicates how many processes we want to use for each job. This number needs to match with the actual number of threads / processes that our command or tool will use for the analysis.
|
34
32
|
|
@@ -45,6 +43,37 @@ mkdir -p /data/Project_X/Sample_Y_mapping
|
|
45
43
|
|
46
44
|
and this will be repeated for every input file, according to the -s parameter. So, in this case given that we have 2 input files for each command line and that we had 60 R1 and 60 R2 FastQ files and we have specified "-s 1", 60 different jobs will be created and submitted, each with a specific read pair to be processed by Bowtie.
|
47
45
|
|
46
|
+
Output management
|
47
|
+
-----------------
|
48
|
+
For each job, BioGrid will set an output name according to a UUID generated on the fly and the combination of the job name plus an incremental number. So a typical output file name will look like this:
|
49
|
+
|
50
|
+
```shell
|
51
|
+
3cb0b800_Bowtie_mapping_001.bam
|
52
|
+
```
|
53
|
+
IMPORTANT: the UUID will be the same for all the jobs submitted in a same BioGrid run, the only changing part will be the incremental number.
|
54
|
+
|
55
|
+
If you want to do some [Advanced stuff](https://github.com/fstrozzi/bioruby-grid#advanced-stuff) and run parameters testing, the output names will be changed accordingly by BioGrid. So if I am running BioGrid to test some parameter ```-L``` for my favorite tool, and I am sampling it, with three different values, let's say 3, 7 and 10 the corresponding output files will be:
|
56
|
+
|
57
|
+
```shell
|
58
|
+
9ec55d90_tophat_001-param:3.sam
|
59
|
+
9ec55d90_tophat_001-param:7.sam
|
60
|
+
9ec55d90_tophat_001-param:10.sam
|
61
|
+
```
|
62
|
+
|
63
|
+
If you are using the ```--param``` options to test non-numerical parameters, the corresponding parameter value (or name) will be appended to the output file name in the same way:
|
64
|
+
|
65
|
+
```shell
|
66
|
+
9ec55d90_tophat_001-param:--sensitive.sam
|
67
|
+
9ec55d90_tophat_001-param:--fast.sam
|
68
|
+
```
|
69
|
+
|
70
|
+
###Differences between output files and output folder
|
71
|
+
|
72
|
+
BioGrid will act differently if the output of a single job is a file or a folder. You need to specify this, by adding a file extension to the ```<output>``` placeholder. So, for instance, if the output file of my job is a BAM file, I will need to specify this in the command line definition, by putting a ```<output>.bam``` .
|
73
|
+
**If no extension is specified for the ```<output>``` placeholder in the command line definition, BioGrid will assume the job will generate more than one output file and that those files will be saved into the folder specified by the ```-o``` option**. Therefore it will manage the output as a whole directory, copying and/or removing the entire folder if ```-r``` and ```-e``` options are present (check the [Other options](https://github.com/fstrozzi/bioruby-grid#other-options) section to see what these options are expected to do).
|
74
|
+
|
75
|
+
The naming conventions for the output folder are same as for the output files.
|
76
|
+
|
48
77
|
Other options
|
49
78
|
-------------
|
50
79
|
|
data/VERSION
CHANGED
@@ -1 +1 @@
|
|
1
|
-
0.
|
1
|
+
0.3.0
|
data/bio-grid.gemspec
CHANGED
data/lib/bio/grid/job.rb
CHANGED
@@ -18,15 +18,13 @@ module Bio
|
|
18
18
|
inputs.each do |input|
|
19
19
|
commandline.gsub!(/<#{input}>/,groups[input][index].join(self.options[:sep]))
|
20
20
|
end
|
21
|
-
job_output = ""
|
21
|
+
job_output = self.options[:output]+"/#{options[:uuid]}_"+self.options[:name]+"_%03d" % (index+1).to_s + "#{self.options[:parameter_value]}"
|
22
22
|
if commandline =~/<output>\.(\S+)/
|
23
23
|
extension = $1
|
24
|
-
job_output = self.options[:output]+"/#{options[:uuid]}_"+self.options[:name]+"_output_%03d" % (index+1).to_s + "#{self.options[:parameter_value]}"
|
25
24
|
commandline.gsub!(/<output>/,job_output)
|
26
25
|
job_output << ".#{extension}"
|
27
26
|
else
|
28
27
|
self.options[:output_folder] = true
|
29
|
-
job_output = self.options[:output]+"/#{options[:uuid]}_"+self.options[:name]
|
30
28
|
commandline.gsub!(/<output>/,job_output)
|
31
29
|
end
|
32
30
|
self.instructions << commandline+"\n"
|
metadata
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
--- !ruby/object:Gem::Specification
|
2
2
|
name: bio-grid
|
3
3
|
version: !ruby/object:Gem::Version
|
4
|
-
version: 0.
|
4
|
+
version: 0.3.0
|
5
5
|
prerelease:
|
6
6
|
platform: ruby
|
7
7
|
authors:
|
@@ -146,7 +146,7 @@ required_ruby_version: !ruby/object:Gem::Requirement
|
|
146
146
|
version: '0'
|
147
147
|
segments:
|
148
148
|
- 0
|
149
|
-
hash:
|
149
|
+
hash: 2907219030788310971
|
150
150
|
required_rubygems_version: !ruby/object:Gem::Requirement
|
151
151
|
none: false
|
152
152
|
requirements:
|