protk 1.4.1 → 1.4.2
Sign up to get free protection for your applications and to get access to all the features.
- checksums.yaml +4 -4
- data/README.md +32 -15
- data/bin/mzid_to_pepxml.rb +75 -0
- data/bin/mzid_to_protxml.rb +77 -0
- data/bin/protxml_to_gff.rb +1 -1
- data/bin/sixframe.rb +24 -5
- data/bin/spectrast_create.rb +125 -0
- data/bin/spectrast_filter.rb +108 -0
- data/lib/protk/command_runner.rb +1 -1
- data/lib/protk/data/template_pep.xml +34 -0
- data/lib/protk/data/template_prot.xml +39 -0
- data/lib/protk/mzidentml_doc.rb +140 -0
- data/lib/protk/mzml_parser.rb +9 -0
- data/lib/protk/peptide.rb +39 -5
- data/lib/protk/pepxml_writer.rb +24 -0
- data/lib/protk/physical_constants.rb +1 -0
- data/lib/protk/protein.rb +64 -1
- data/lib/protk/protein_group.rb +70 -0
- data/lib/protk/protxml_writer.rb +27 -0
- data/lib/protk/psm.rb +222 -0
- data/lib/protk/search_tool.rb +1 -6
- data/lib/protk/sniffer.rb +35 -0
- data/lib/protk/spectrum_query.rb +132 -0
- metadata +20 -2
checksums.yaml
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
---
|
2
2
|
SHA1:
|
3
|
-
metadata.gz:
|
4
|
-
data.tar.gz:
|
3
|
+
metadata.gz: 7377c1480498f852b7e747d13e9a7d985523fcef
|
4
|
+
data.tar.gz: 2cb2c652e53ec636fb521cb35a687324ee810af8
|
5
5
|
SHA512:
|
6
|
-
metadata.gz:
|
7
|
-
data.tar.gz:
|
6
|
+
metadata.gz: c4e72457cc9ada490ea6210c9d13e6e5d240c0d19399c5b015feb30cebb881b51bae62cb8d4aa831d8aee397a747d5c16b1433f0ea08656474a56270f709b3d7
|
7
|
+
data.tar.gz: e8893c4fda75666fdf4ed3cb6d6dc0bb34ceb3041c2b19883e82891aa5245a4a7aa99cc4fe677f2fb6f28c2b12d5bbb89a213a59e3f1f0e4d8dc927a9f6ff510
|
data/README.md
CHANGED
@@ -22,7 +22,10 @@ Protk is a ruby gem and requires ruby 2.0 or higher with support for libxml2. To
|
|
22
22
|
gem install protk
|
23
23
|
```
|
24
24
|
|
25
|
+
## Ruby Compatibility
|
25
26
|
|
27
|
+
In general Protk requires ruby with a version >=2.0.
|
28
|
+
Do not use ruby 2.1.5 as this has a bug that causes a deadlock related to open4 and child processes writing to stderr.
|
26
29
|
|
27
30
|
## Usage
|
28
31
|
|
@@ -60,32 +63,28 @@ By default protk will install tools and databases into `.protk` in your home dir
|
|
60
63
|
```
|
61
64
|
|
62
65
|
|
63
|
-
## Sequence databases
|
64
66
|
|
65
|
-
|
67
|
+
## Galaxy Integration
|
66
68
|
|
67
|
-
|
68
|
-
manage_db.rb add --predefined crap
|
69
|
-
manage_db.rb add --predefined sphuman
|
70
|
-
manage_db.rb update crap
|
71
|
-
manage_db.rb update sphuman
|
72
|
-
```
|
69
|
+
Many protk tools have equivalent galaxy wrappers available on the [galaxy toolshed](http://toolshed.g2.bx.psu.edu/) with source code and development occuring in the [protk-galaxytools](github.com/iracooke/protk-galaxytools) repository on github. In order for these tools to work you will also need to make sure that protk, as well as the necessary third party dependencies are available to galaxy during tool execution.
|
73
70
|
|
74
|
-
|
71
|
+
There are two ways to do this
|
75
72
|
|
76
|
-
|
77
|
-
manage_db.rb add --ftp-source 'ftp://ftp.uniprot.org/pub/databases/uniprot/current_release/knowledgebase/complete/uniprot_sprot.fasta.gz ftp://ftp.uniprot.org/pub/databases/uniprot/current_release/knowledgebase/complete/reldate.txt' --include-filters '/OS=Homo\ssapiens/' --id-regex 'sp\|.*\|(.*?)\s' --add-decoys --make-blast-index --archive-old sphuman
|
78
|
-
```
|
73
|
+
**Using Docker:**
|
79
74
|
|
80
|
-
|
75
|
+
By far the easiest way to do this is to set up your Galaxy instance to run tools in Docker containers. All the tools in the [protk-galaxytools](github.com/iracooke/protk-galaxytools) repository are designed to work with [this](https://github.com/iracooke/protk-dockerfile) docker image, and will download and use the image automatically on apprioriately configured Galaxy instances.
|
76
|
+
|
77
|
+
**Manual Install**
|
81
78
|
|
82
|
-
|
79
|
+
If your galaxy instance is unable to use Docker for some reason you will need to install `protk` and its dependencies manually.
|
80
|
+
|
81
|
+
One way to install protk would be to just do `gem install protk` using the default system ruby (without rvm). This will probably just work, however you will lose the ability to run specific versions of tools against specific versions of protk. The recommended method of installing protk for use with galaxy is as follows;
|
83
82
|
|
84
83
|
1. Ensure you have a working install of galaxy.
|
85
84
|
|
86
85
|
[Full instructions](https://wiki.galaxyproject.org/Admin/GetGalaxy) are available on the official Galaxy project wiki page. We assume you have galaxy installed in a directory called galaxy-dist.
|
87
86
|
|
88
|
-
2. Install rvm if you haven't
|
87
|
+
2. Install rvm if you haven't already. See [here](https://rvm.io/) for more information.
|
89
88
|
|
90
89
|
```bash
|
91
90
|
curl -sSL https://get.rvm.io | bash -s stable
|
@@ -148,4 +147,22 @@ Many protk tools have equivalent galaxy wrappers available on the [galaxy toolsh
|
|
148
147
|
ln -s 1.5 default
|
149
148
|
```
|
150
149
|
|
150
|
+
## Sequence databases
|
151
|
+
|
152
|
+
All `protk` tools are designed to work with sequence databases provided as simple fasta formatted flat files. For most use cases it is simplest to just manage these manually.
|
153
|
+
|
154
|
+
Protk includes a script called `manage_db.rb` to install certain sequence databases in a central repository. Databases installed via `manage_db.rb` can be invoked using a shorthand name rather than a full path to a fasta file. Protk comes with several predefined database configurations. For example, to install a database consisting of human entries from Swissprot plus known contaminants use the following commands;
|
155
|
+
|
156
|
+
```sh
|
157
|
+
manage_db.rb add --predefined crap
|
158
|
+
manage_db.rb add --predefined sphuman
|
159
|
+
manage_db.rb update crap
|
160
|
+
manage_db.rb update sphuman
|
161
|
+
```
|
162
|
+
|
163
|
+
You should now be able to run database searches, specifying this database by using the -d sphuman flag. Every month or so swissprot will release a new database version. You can keep your database up to date using the manage_db.rb update command. This will update the database only if any of its source files (or ftp release notes) have changed. The manage_db.rb tool also allows completely custom databases to be configured. Setup requires adding quite a few command-line options but once setup, databases can easily be updated without further config. The example below shows the commandline arguments required to manually configure the sphuman database.
|
164
|
+
|
165
|
+
```sh
|
166
|
+
manage_db.rb add --ftp-source 'ftp://ftp.uniprot.org/pub/databases/uniprot/current_release/knowledgebase/complete/uniprot_sprot.fasta.gz ftp://ftp.uniprot.org/pub/databases/uniprot/current_release/knowledgebase/complete/reldate.txt' --include-filters '/OS=Homo\ssapiens/' --id-regex 'sp\|.*\|(.*?)\s' --add-decoys --make-blast-index --archive-old sphuman
|
167
|
+
```
|
151
168
|
|
@@ -0,0 +1,75 @@
|
|
1
|
+
#!/usr/bin/env ruby
|
2
|
+
#
|
3
|
+
# This file is part of protk
|
4
|
+
# Created by Ira Cooke 8/5/2015
|
5
|
+
#
|
6
|
+
# Convert mzid to pepXML
|
7
|
+
#
|
8
|
+
#
|
9
|
+
|
10
|
+
require 'libxml'
|
11
|
+
require 'protk/constants'
|
12
|
+
require 'protk/command_runner'
|
13
|
+
require 'protk/mzidentml_doc'
|
14
|
+
require 'protk/spectrum_query'
|
15
|
+
require 'protk/pepxml_writer'
|
16
|
+
require 'protk/tool'
|
17
|
+
|
18
|
+
include LibXML
|
19
|
+
|
20
|
+
XML.indent_tree_output=true
|
21
|
+
|
22
|
+
|
23
|
+
# Setup specific command-line options for this tool. Other options are inherited from Tool
|
24
|
+
#
|
25
|
+
tool=Tool.new([:explicit_output,:debug])
|
26
|
+
# tool.add_value_option(:minprob,0.05,['--minprob mp',"Minimum probability for psm to be included in the output"])
|
27
|
+
|
28
|
+
tool.option_parser.banner = "Convert an mzIdentML file to pep.xml\n\nUsage: mzid_to_pepxml.rb [options] file1.mzid"
|
29
|
+
|
30
|
+
exit unless tool.check_options(true)
|
31
|
+
|
32
|
+
$protk = Constants.instance
|
33
|
+
log_level = tool.debug ? "info" : "warn"
|
34
|
+
$protk.info_level= log_level
|
35
|
+
|
36
|
+
input_file=ARGV[0]
|
37
|
+
|
38
|
+
if tool.explicit_output
|
39
|
+
output_file_name=tool.explicit_output
|
40
|
+
else
|
41
|
+
output_file_name=Tool.default_output_path(input_file,".pep.xml","","")
|
42
|
+
end
|
43
|
+
|
44
|
+
pep_xml_writer = PepXMLWriter.new
|
45
|
+
|
46
|
+
mzid_doc = MzIdentMLDoc.new(input_file)
|
47
|
+
|
48
|
+
spectrum_queries = mzid_doc.spectrum_queries
|
49
|
+
|
50
|
+
n_queries = spectrum_queries.length
|
51
|
+
|
52
|
+
$protk.log "Converting #{n_queries} spectrum queries", :info
|
53
|
+
$protk.log "Output will be written to #{output_file_name}", :info
|
54
|
+
|
55
|
+
i=0
|
56
|
+
n_written=0
|
57
|
+
progress_increment=1
|
58
|
+
spectrum_queries.each do |query_node|
|
59
|
+
if i % progress_increment ==0
|
60
|
+
$stdout.write "Scanned #{i} and read #{n_written} of #{n_queries}\r"
|
61
|
+
end
|
62
|
+
|
63
|
+
# require 'byebug';byebug
|
64
|
+
|
65
|
+
query = SpectrumQuery.from_mzid(query_node)
|
66
|
+
pep_xml_writer.append_spectrum_query(query.as_pepxml)
|
67
|
+
n_written+=1
|
68
|
+
|
69
|
+
i+=1
|
70
|
+
|
71
|
+
end
|
72
|
+
|
73
|
+
$protk.log "Writing #{n_written} spectrum queries to #{output_file_name}", :info
|
74
|
+
|
75
|
+
pep_xml_writer.save(output_file_name)
|
@@ -0,0 +1,77 @@
|
|
1
|
+
#!/usr/bin/env ruby
|
2
|
+
#
|
3
|
+
# This file is part of protk
|
4
|
+
# Created by Ira Cooke 7/5/2015
|
5
|
+
#
|
6
|
+
# Convert mzid to protXML
|
7
|
+
#
|
8
|
+
#
|
9
|
+
|
10
|
+
require 'libxml'
|
11
|
+
require 'protk/constants'
|
12
|
+
require 'protk/command_runner'
|
13
|
+
require 'protk/mzidentml_doc'
|
14
|
+
require 'protk/protein_group'
|
15
|
+
require 'protk/tool'
|
16
|
+
|
17
|
+
include LibXML
|
18
|
+
|
19
|
+
XML.indent_tree_output=true
|
20
|
+
|
21
|
+
|
22
|
+
# Setup specific command-line options for this tool. Other options are inherited from ProphetTool
|
23
|
+
#
|
24
|
+
tool=Tool.new([:explicit_output,:debug])
|
25
|
+
tool.add_value_option(:minprob,0.05,['--minprob mp',"Minimum probability for protein to be included in the output"])
|
26
|
+
|
27
|
+
tool.option_parser.banner = "Convert an mzIdentML file to protXML.\n\nUsage: mzid_to_protxml.rb [options] file1.mzid"
|
28
|
+
|
29
|
+
exit unless tool.check_options(true)
|
30
|
+
|
31
|
+
$protk = Constants.instance
|
32
|
+
log_level = tool.debug ? "info" : "warn"
|
33
|
+
$protk.info_level= log_level
|
34
|
+
|
35
|
+
input_file=ARGV[0]
|
36
|
+
|
37
|
+
if tool.explicit_output
|
38
|
+
output_file_name=tool.explicit_output
|
39
|
+
else
|
40
|
+
output_file_name=Tool.default_output_path(input_file,".protXML","","")
|
41
|
+
end
|
42
|
+
|
43
|
+
prot_xml_writer = ProtXMLWriter.new
|
44
|
+
|
45
|
+
mzid_doc = MzIdentMLDoc.new(input_file)
|
46
|
+
|
47
|
+
protein_groups = mzid_doc.protein_groups
|
48
|
+
|
49
|
+
n_prots = protein_groups.length
|
50
|
+
|
51
|
+
$protk.log "Converting #{n_prots} protein_groups", :info
|
52
|
+
$protk.log "Output will be written to #{output_file_name}", :info
|
53
|
+
|
54
|
+
i=0
|
55
|
+
n_written=0
|
56
|
+
progress_increment=1
|
57
|
+
protein_groups.each do |group_node|
|
58
|
+
if i % progress_increment ==0
|
59
|
+
$stdout.write "Scanned #{i} and read #{n_written} of #{n_prots}\r"
|
60
|
+
end
|
61
|
+
|
62
|
+
# require 'byebug';byebug
|
63
|
+
group_prob = MzIdentMLDoc.get_cvParam(group_node,"MS:1002470").attributes['value'].to_f*0.01
|
64
|
+
|
65
|
+
if group_prob > tool.minprob.to_f
|
66
|
+
group = ProteinGroup.from_mzid(group_node)
|
67
|
+
prot_xml_writer.append_protein_group(group.as_protxml)
|
68
|
+
n_written+=1
|
69
|
+
end
|
70
|
+
|
71
|
+
i+=1
|
72
|
+
|
73
|
+
end
|
74
|
+
|
75
|
+
$protk.log "Writing #{n_written} proteins to #{output_file_name}", :info
|
76
|
+
|
77
|
+
prot_xml_writer.save(output_file_name)
|
data/bin/protxml_to_gff.rb
CHANGED
@@ -155,7 +155,7 @@ proteins.each do |protein|
|
|
155
155
|
peptides = tool.stack_charge_states ? protein.peptides : protein.representative_peptides
|
156
156
|
|
157
157
|
peptides.each do |peptide|
|
158
|
-
if peptide.
|
158
|
+
if peptide.probability >= tool.peptide_probability_threshold
|
159
159
|
peptide_entries = peptide.to_gff3_records(protein_entry.aaseq,gff_parent_entry,gff_cds_entries)
|
160
160
|
peptide_entries.each do |peptide_entry|
|
161
161
|
output_fh.write peptide_entry.to_s
|
data/bin/sixframe.rb
CHANGED
@@ -25,7 +25,7 @@ end
|
|
25
25
|
|
26
26
|
tool=Tool.new([:explicit_output])
|
27
27
|
tool.option_parser.banner = "Create a sixframe translation of a genome.\n\nUsage: sixframe.rb [options] genome.fasta"
|
28
|
-
|
28
|
+
tool.add_boolean_option(:peptideshaker,false,['--peptideshaker', 'Format fasta output for peptideshaker compatibility'])
|
29
29
|
tool.add_boolean_option(:print_coords,false,['--coords', 'Write genomic coordinates in the fasta header'])
|
30
30
|
tool.add_boolean_option(:keep_header,true,['--strip-header', 'Dont write sequence definition'])
|
31
31
|
tool.add_value_option(:min_len,20,['--min-len l','Minimum ORF length to keep'])
|
@@ -43,8 +43,22 @@ if tool.write_gff
|
|
43
43
|
output_fh.write "##gff-version 3\n"
|
44
44
|
end
|
45
45
|
|
46
|
+
accession_prefix=tool.peptideshaker ? "generic" : "lcl"
|
47
|
+
coords_separator=tool.peptideshaker ? "|" : " "
|
48
|
+
|
46
49
|
file = Bio::FastaFormat.open(input_file)
|
47
50
|
|
51
|
+
def passes_qc(orf,tool)
|
52
|
+
long_enough = orf.length > tool.min_len.to_i
|
53
|
+
|
54
|
+
composition_ok=true
|
55
|
+
if tool.peptideshaker && (orf=~/X/)
|
56
|
+
composition_ok=false
|
57
|
+
end
|
58
|
+
|
59
|
+
(long_enough && composition_ok)
|
60
|
+
end
|
61
|
+
|
48
62
|
file.each do |entry|
|
49
63
|
|
50
64
|
length = entry.naseq.length
|
@@ -58,7 +72,7 @@ file.each do |entry|
|
|
58
72
|
oi=0
|
59
73
|
orfs.each do |orf|
|
60
74
|
oi+=1
|
61
|
-
if ( orf
|
75
|
+
if ( passes_qc(orf,tool) )
|
62
76
|
|
63
77
|
position_start = position
|
64
78
|
position_end = position_start + orf.length*3 -1
|
@@ -71,15 +85,20 @@ file.each do |entry|
|
|
71
85
|
end
|
72
86
|
|
73
87
|
# Create accession compliant with NCBI naming standard
|
88
|
+
#
|
74
89
|
# See http://www.ncbi.nlm.nih.gov/books/NBK7183/?rendertype=table&id=ch_demo.T5
|
90
|
+
#
|
91
|
+
# Or with PeptideShaker standard
|
92
|
+
#
|
93
|
+
#
|
75
94
|
ncbi_scaffold_id = entry.entry_id.gsub('|','_').gsub(' ','_')
|
76
|
-
ncbi_accession = "
|
95
|
+
ncbi_accession = "#{accession_prefix}|#{ncbi_scaffold_id}_frame_#{frame}_orf_#{oi}"
|
77
96
|
gff_id = "#{ncbi_scaffold_id}_frame_#{frame}_orf_#{oi}"
|
78
97
|
|
79
98
|
defline=">#{ncbi_accession}"
|
80
99
|
|
81
100
|
if tool.print_coords
|
82
|
-
defline << "
|
101
|
+
defline << "#{coords_separator}#{position_start}|#{position_end}"
|
83
102
|
end
|
84
103
|
|
85
104
|
if tool.keep_header
|
@@ -88,7 +107,7 @@ file.each do |entry|
|
|
88
107
|
|
89
108
|
if tool.write_gff
|
90
109
|
strand = frame>3 ? "-" : "+"
|
91
|
-
# score = self.
|
110
|
+
# score = self.probability.nil? ? "." : self.probability.to_s
|
92
111
|
# gff_string = "#{parent_record.seqid}\tMSMS\tpolypeptide\t#{start_i}\t#{end_i}\t#{score}\t#{parent_record.strand}\t0\tID=#{this_id};Parent=#{cds_id}"
|
93
112
|
output_fh.write("#{ncbi_scaffold_id}\tsixframe\tCDS\t#{position_start}\t#{position_end}\t.\t#{strand}\t0\tID=#{gff_id}\n")
|
94
113
|
else
|
@@ -0,0 +1,125 @@
|
|
1
|
+
#!/usr/bin/env ruby
|
2
|
+
#
|
3
|
+
# This file is part of protk
|
4
|
+
# Created by Ira Cooke 30/4/2015
|
5
|
+
#
|
6
|
+
# A wrapper for the SpectraST create command
|
7
|
+
#
|
8
|
+
#
|
9
|
+
|
10
|
+
require 'protk/constants'
|
11
|
+
require 'protk/command_runner'
|
12
|
+
require 'protk/tool'
|
13
|
+
require 'protk/galaxy_util'
|
14
|
+
require 'protk/pepxml'
|
15
|
+
require 'protk/sniffer'
|
16
|
+
require 'protk/mzml_parser'
|
17
|
+
|
18
|
+
for_galaxy = GalaxyUtil.for_galaxy?
|
19
|
+
|
20
|
+
genv=Constants.instance
|
21
|
+
|
22
|
+
# Setup specific command-line options for this tool. Other options are inherited from ProphetTool
|
23
|
+
#
|
24
|
+
spectrast_tool=Tool.new([:explicit_output])
|
25
|
+
spectrast_tool.option_parser.banner = "Create a spectral library from pep.xml input files.\n\nUsage: spectrast_create.rb [options] file1.pep.xml file1.pep.xml ..."
|
26
|
+
spectrast_tool.add_value_option(:spectrum_files,"",['--spectrum-files sf','Paths to raw spectrum files. These should be provided in a comma separated list'])
|
27
|
+
spectrast_tool.add_boolean_option(:binary_output,false,['-B','--binary-output','Produce spectral libraries in binary format rather than ASCII'])
|
28
|
+
spectrast_tool.add_value_option(:filter_predicate,nil,['--predicate pred','Keep only spectra satifying predicate pred. Should be a C-style predicate'])
|
29
|
+
spectrast_tool.add_value_option(:probability_threshold,0.99,['--p-thresh val', 'Probability threshold below which spectra are discarded'])
|
30
|
+
spectrast_tool.add_value_option(:instrument_acquisition,"CID",['--instrument-acquisition setting',
|
31
|
+
'Set the instrument and acquisition settings of the spectra (in case not specified in data files).
|
32
|
+
Examples: CID, ETD, CID-QTOF, HCD. The latter two are treated as high-mass accuracy spectra.'])
|
33
|
+
|
34
|
+
exit unless spectrast_tool.check_options(true)
|
35
|
+
|
36
|
+
spectrast_bin = %x[which spectrast].chomp
|
37
|
+
|
38
|
+
# Options: GENERAL OPTIONS
|
39
|
+
# -cF<file> Read create options from file <file>.
|
40
|
+
# If <file> is not given, "spectrast_create.params" is assumed.
|
41
|
+
# NOTE: All options set in the file will be overridden by command-line options, if specified.
|
42
|
+
# -cm<remark> Remark. Add a Remark=<remark> comment to all library entries created.
|
43
|
+
# -cM<format> Write all library spectra as MRM transition tables. Leave <format> blank for default. (Turn off with -cM!)
|
44
|
+
# -cT<file> Use probability table in <file>. Only those peptide ions included in the table will be imported.
|
45
|
+
# A probability table is a text file with one peptide ion in the format AC[160]DEFGHIK/2 per line.
|
46
|
+
# If a probability is supplied following the peptide ion separated by a tab, it will be used to replace the original probability of that library entry.
|
47
|
+
# -cO<file> Use protein list in <file>. Only those peptide ions associated with proteins in the list will be imported.
|
48
|
+
# A protein list is a text file with one protein identifier per line.
|
49
|
+
# If a number X is supplied following the protein separated by a tab, then at most X peptide ions associated with that protein will be imported.
|
50
|
+
|
51
|
+
# PEPXML IMPORT OPTIONS (Applicable with .pepXML files)
|
52
|
+
# -cP<prob> Include all spectra identified with probability no less than <prob> in the library.
|
53
|
+
# -cq<fdr> (Only PepXML import) Only include spectra with global FDR no greater than <fdr> in the library.
|
54
|
+
# -cn<name> Specify a dataset identifier for the file to be imported.
|
55
|
+
# -co Add the originating mzXML file name to the dataset identifier. Good for keeping track of in which
|
56
|
+
# MS run the peptide is observed. (Turn off with -co!)
|
57
|
+
# -cg Set all asparagines (N) in the motif NX(S/T) as deamidated (N[115]). Use for glycocaptured peptides. (Turn off with -cg!).
|
58
|
+
# -cI Set the instrument and acquisition settings of the spectra (in case not specified in data files).
|
59
|
+
# Examples: -cICID, -cIETD, -cICID-QTOF, -cIHCD. The latter two are treated as high-mass accuracy spectra.
|
60
|
+
#
|
61
|
+
|
62
|
+
# -cf<pred> Filter library. Keep only those entries satisfying the predicate <pred>.
|
63
|
+
# <pred> should be a C-style predicate in quotes.
|
64
|
+
|
65
|
+
input_stagers=[]
|
66
|
+
inputs=ARGV.collect { |file_name| file_name.chomp}
|
67
|
+
if for_galaxy
|
68
|
+
input_stagers = inputs.collect {|ip| GalaxyUtil.stage_pepxml(ip) }
|
69
|
+
inputs=input_stagers.collect { |sg| sg.staged_path }
|
70
|
+
end
|
71
|
+
|
72
|
+
spectrum_file_paths=spectrast_tool.spectrum_files.split(",").collect { |mod| mod.lstrip.rstrip }.reject {|e| e.empty? }
|
73
|
+
|
74
|
+
spectrum_file_paths.each do |rf|
|
75
|
+
throw "Provided spectrum file #{rf} does not exist" unless File.exists? rf
|
76
|
+
format = Sniffer.sniff_format(rf)
|
77
|
+
throw "Unrecognised format #{format} detected for spectrum file #{rf}" unless ["mzML","mgf"].include? format
|
78
|
+
|
79
|
+
# basename_no_ext = File.basename(rf,File.extname(rf))
|
80
|
+
runid_name = MzMLParser.new(rf).next_runid()
|
81
|
+
|
82
|
+
expected_name = "#{runid_name}.#{format}"
|
83
|
+
|
84
|
+
if for_galaxy || !File.exists?(expected_name)
|
85
|
+
raw_input_stager = GalaxyStager.new(rf,{:extension=>".#{format}",:name=>runid_name})
|
86
|
+
puts raw_input_stager.staged_path
|
87
|
+
end
|
88
|
+
|
89
|
+
end
|
90
|
+
|
91
|
+
|
92
|
+
cmd="#{spectrast_bin} "
|
93
|
+
|
94
|
+
unless spectrast_tool.binary_output
|
95
|
+
cmd << " -c_BIN!"
|
96
|
+
end
|
97
|
+
|
98
|
+
if spectrast_tool.filter_predicate
|
99
|
+
cmd << " -cf'#{spectrast_tool.filter_predicate}'"
|
100
|
+
end
|
101
|
+
|
102
|
+
|
103
|
+
|
104
|
+
cmd << " -cI#{spectrast_tool.instrument_acquisition}"
|
105
|
+
|
106
|
+
if spectrast_tool.explicit_output==nil
|
107
|
+
output_file_name=Tool.default_output_path(inputs,"","","")
|
108
|
+
else
|
109
|
+
output_file_name=spectrast_tool.explicit_output
|
110
|
+
end
|
111
|
+
|
112
|
+
cmd << " -cN#{output_file_name}"
|
113
|
+
|
114
|
+
cmd << " -cP#{spectrast_tool.probability_threshold}"
|
115
|
+
|
116
|
+
inputs.each { |ip| cmd << " #{ip}" }
|
117
|
+
|
118
|
+
# code = spectrast_tool.run(cmd,genv)
|
119
|
+
# throw "Command failed with exit code #{code}" unless code==0
|
120
|
+
|
121
|
+
%x[#{cmd}]
|
122
|
+
|
123
|
+
|
124
|
+
|
125
|
+
|