protk 1.4.2 → 1.4.3
Sign up to get free protection for your applications and to get access to all the features.
- checksums.yaml +4 -4
- data/README.md +31 -20
- data/bin/filter_fasta.rb +74 -0
- data/bin/filter_psms.rb +109 -0
- data/bin/msgfplus_search.rb +19 -10
- data/bin/mzid_to_protxml.rb +8 -5
- data/bin/peptide_prophet.rb +6 -1
- data/bin/protxml_to_gff.rb +12 -2
- data/lib/protk/constants.rb +1 -1
- data/lib/protk/mzidentml_doc.rb +57 -13
- data/lib/protk/peptide.rb +6 -6
- data/lib/protk/prophet_tool.rb +3 -1
- data/lib/protk/protein.rb +15 -8
- data/lib/protk/protein_group.rb +11 -4
- data/lib/protk/psm.rb +6 -6
- data/lib/protk/search_tool.rb +5 -5
- data/lib/protk/spectrum_query.rb +4 -4
- data/lib/protk/tool.rb +34 -26
- metadata +7 -4
checksums.yaml
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
---
|
2
2
|
SHA1:
|
3
|
-
metadata.gz:
|
4
|
-
data.tar.gz:
|
3
|
+
metadata.gz: 31df514a2203236ea9ac25f8d5cc9c282378e04d
|
4
|
+
data.tar.gz: f1bb438ef01003afc166eb5b0342dbd8f6ecd09b
|
5
5
|
SHA512:
|
6
|
-
metadata.gz:
|
7
|
-
data.tar.gz:
|
6
|
+
metadata.gz: 7f7f0fe81411f17b89037162ad7bf5374be69309888bda2f84b058d69773dd1790469d40dd5797e0306ce49ecc95426cfb80b65bfcb95ac0a052be90db40ea42
|
7
|
+
data.tar.gz: f2db6018ac90079e925f7c5c071be3fbedd77bc3787cfc58a91aa38048f43f60426020a3bff3a5a58c4bea6d71aaad7c0ab10437e050eaf1998792ca6ab2e1dd
|
data/README.md
CHANGED
@@ -3,8 +3,6 @@
|
|
3
3
|
|
4
4
|
# protk ( Proteomics toolkit )
|
5
5
|
|
6
|
-
|
7
|
-
***
|
8
6
|
## What is it?
|
9
7
|
|
10
8
|
Protk is a suite of tools for proteomics. It aims to present a simple and consistent command-line interface across otherwise disparate third party tools. The following analysis tasks are currently supported;
|
@@ -68,15 +66,27 @@ By default protk will install tools and databases into `.protk` in your home dir
|
|
68
66
|
|
69
67
|
Many protk tools have equivalent galaxy wrappers available on the [galaxy toolshed](http://toolshed.g2.bx.psu.edu/) with source code and development occuring in the [protk-galaxytools](github.com/iracooke/protk-galaxytools) repository on github. In order for these tools to work you will also need to make sure that protk, as well as the necessary third party dependencies are available to galaxy during tool execution.
|
70
68
|
|
71
|
-
There are
|
69
|
+
There are three ways to do this
|
72
70
|
|
73
71
|
**Using Docker:**
|
74
72
|
|
75
73
|
By far the easiest way to do this is to set up your Galaxy instance to run tools in Docker containers. All the tools in the [protk-galaxytools](github.com/iracooke/protk-galaxytools) repository are designed to work with [this](https://github.com/iracooke/protk-dockerfile) docker image, and will download and use the image automatically on apprioriately configured Galaxy instances.
|
76
74
|
|
75
|
+
**Using the Galaxy Tool Shed (Experimental)**
|
76
|
+
|
77
|
+
An installation recipe of `protk` is available from the [Galaxy Tool Shed](https://testtoolshed.g2.bx.psu.edu/view/iuc/package_protk_1_4_2/). If you want to depend on protk for your own Galaxy wrapper create a `tool_dependencies.xml` file with the following content.
|
78
|
+
|
79
|
+
```xml
|
80
|
+
<tool_dependency>
|
81
|
+
<package name="protk" version="1.4.2">
|
82
|
+
<repository name="package_protk_1_4_2" owner="iuc"/>
|
83
|
+
</package>
|
84
|
+
</tool_dependency>
|
85
|
+
```
|
86
|
+
|
77
87
|
**Manual Install**
|
78
88
|
|
79
|
-
If your galaxy instance is unable to use Docker for some reason you will need to install `protk` and its dependencies manually.
|
89
|
+
If your galaxy instance is unable to use Docker or the Tool Shed for some reason you will need to install `protk` and its dependencies manually.
|
80
90
|
|
81
91
|
One way to install protk would be to just do `gem install protk` using the default system ruby (without rvm). This will probably just work, however you will lose the ability to run specific versions of tools against specific versions of protk. The recommended method of installing protk for use with galaxy is as follows;
|
82
92
|
|
@@ -98,13 +108,13 @@ One way to install protk would be to just do `gem install protk` using the defau
|
|
98
108
|
|
99
109
|
4. Install protk in an isolated gemset using rvm.
|
100
110
|
|
101
|
-
This sets up an isolated environment where only a specific version of protk is available. We name the environment according to the protk
|
111
|
+
This sets up an isolated environment where only a specific version of protk is available. We name the environment according to the protk version number (1.4.2 in this example).
|
102
112
|
|
103
113
|
```bash
|
104
114
|
rvm 2.1
|
105
|
-
rvm gemset create protk1.4
|
106
|
-
rvm use 2.1@protk1.4
|
107
|
-
gem install protk -v '~>1.4'
|
115
|
+
rvm gemset create protk1.4.2
|
116
|
+
rvm use 2.1@protk1.4.2
|
117
|
+
gem install protk -v '~>1.4.2'
|
108
118
|
```
|
109
119
|
|
110
120
|
5. Configure Galaxy's tool dependency directory.
|
@@ -124,11 +134,11 @@ One way to install protk would be to just do `gem install protk` using the defau
|
|
124
134
|
cd <tool_dependency_dir>
|
125
135
|
mkdir protk
|
126
136
|
cd protk
|
127
|
-
mkdir 1.4
|
128
|
-
ln -s 1.4 default
|
129
|
-
rvm use 2.1@protk1.4
|
130
|
-
rvmenv=`rvm env --path 2.1@protk1.4`
|
131
|
-
echo ". $rvmenv" > 1.4/env.sh
|
137
|
+
mkdir 1.4.2
|
138
|
+
ln -s 1.4.2 default
|
139
|
+
rvm use 2.1@protk1.4.2
|
140
|
+
rvmenv=`rvm env --path 2.1@protk1.4.2`
|
141
|
+
echo ". $rvmenv" > 1.4.2/env.sh
|
132
142
|
```
|
133
143
|
|
134
144
|
7. Keep things up to date
|
@@ -137,14 +147,14 @@ One way to install protk would be to just do `gem install protk` using the defau
|
|
137
147
|
|
138
148
|
```bash
|
139
149
|
rvm 2.1
|
140
|
-
rvm gemset create protk1.5
|
141
|
-
rvm use 2.1@protk1.5
|
142
|
-
gem install protk -v '~>1.5'
|
150
|
+
rvm gemset create protk1.5.0
|
151
|
+
rvm use 2.1@protk1.5.0
|
152
|
+
gem install protk -v '~>1.5.0'
|
143
153
|
cd <tool_dependency_dir>/protk/
|
144
|
-
mkdir 1.5
|
145
|
-
rvmenv=`rvm env --path 2.1@protk1.5`
|
146
|
-
echo ". $rvmenv" > 1.5/env.sh
|
147
|
-
ln -s 1.5 default
|
154
|
+
mkdir 1.5.0
|
155
|
+
rvmenv=`rvm env --path 2.1@protk1.5.0`
|
156
|
+
echo ". $rvmenv" > 1.5.0/env.sh
|
157
|
+
ln -s 1.5.0 default
|
148
158
|
```
|
149
159
|
|
150
160
|
## Sequence databases
|
@@ -166,3 +176,4 @@ You should now be able to run database searches, specifying this database by usi
|
|
166
176
|
manage_db.rb add --ftp-source 'ftp://ftp.uniprot.org/pub/databases/uniprot/current_release/knowledgebase/complete/uniprot_sprot.fasta.gz ftp://ftp.uniprot.org/pub/databases/uniprot/current_release/knowledgebase/complete/reldate.txt' --include-filters '/OS=Homo\ssapiens/' --id-regex 'sp\|.*\|(.*?)\s' --add-decoys --make-blast-index --archive-old sphuman
|
167
177
|
```
|
168
178
|
|
179
|
+
|
data/bin/filter_fasta.rb
ADDED
@@ -0,0 +1,74 @@
|
|
1
|
+
#!/usr/bin/env ruby
|
2
|
+
#
|
3
|
+
# This file is part of protk
|
4
|
+
# Created by Ira Cooke 22/5/2015
|
5
|
+
#
|
6
|
+
# Filters a fasta file so only entries matching a condition are emitted
|
7
|
+
#
|
8
|
+
|
9
|
+
require 'protk/constants'
|
10
|
+
require 'protk/command_runner'
|
11
|
+
require 'protk/tool'
|
12
|
+
require 'set'
|
13
|
+
require 'bio'
|
14
|
+
|
15
|
+
tool=Tool.new([:explicit_output])
|
16
|
+
tool.option_parser.banner = "Filter entries in a fasta file.\n\nUsage: filter_fasta.rb [options] file.fasta file2.fasta"
|
17
|
+
tool.add_value_option(:definition_filter,nil,['--definition filter','Keep entries matching definition'])
|
18
|
+
tool.add_boolean_option(:invert,false,['--invert',"Invert Filter"])
|
19
|
+
tool.add_value_option(:id_filter,nil,['-I filename','--id-filter filename',"Keep entries with given identifiers"])
|
20
|
+
|
21
|
+
exit unless tool.check_options(true)
|
22
|
+
|
23
|
+
input_file=ARGV[0]
|
24
|
+
|
25
|
+
output_fh = tool.explicit_output!=nil ? File.new("#{tool.explicit_output}",'w') : $stdout
|
26
|
+
|
27
|
+
$filter_ids = Set.new()
|
28
|
+
if tool.id_filter && (File.exists?(tool.id_filter) || tool.id_filter=="-")
|
29
|
+
if tool.id_filter=="-"
|
30
|
+
$filter_ids = $stdin.read.split("\n").collect { |e| e.chomp }
|
31
|
+
else
|
32
|
+
$filter_ids = File.readlines(tool.id_filter).collect { |e| e.chomp }
|
33
|
+
end
|
34
|
+
$filter_ids = Set.new($filter_ids) # Much faster set include than array include
|
35
|
+
end
|
36
|
+
|
37
|
+
def passes_filters(entry,tool)
|
38
|
+
|
39
|
+
if tool.definition_filter
|
40
|
+
if entry.definition =~ /#{tool.definition_filter}/
|
41
|
+
return true
|
42
|
+
else
|
43
|
+
return false
|
44
|
+
end
|
45
|
+
end
|
46
|
+
|
47
|
+
if $filter_ids.length > 0
|
48
|
+
# require 'byebug';byebug
|
49
|
+
if $filter_ids.include? entry.entry_id
|
50
|
+
return true
|
51
|
+
end
|
52
|
+
return false
|
53
|
+
end
|
54
|
+
|
55
|
+
# Always true if there are no filters defined
|
56
|
+
|
57
|
+
return true
|
58
|
+
|
59
|
+
end
|
60
|
+
|
61
|
+
|
62
|
+
ARGV.each do |fasta_file|
|
63
|
+
|
64
|
+
file = Bio::FastaFormat.open(fasta_file.chomp)
|
65
|
+
file.each do |entry|
|
66
|
+
|
67
|
+
|
68
|
+
pass = passes_filters(entry,tool)
|
69
|
+
pass = !pass if tool.invert
|
70
|
+
if pass
|
71
|
+
output_fh.write entry
|
72
|
+
end
|
73
|
+
end
|
74
|
+
end
|
data/bin/filter_psms.rb
ADDED
@@ -0,0 +1,109 @@
|
|
1
|
+
#!/usr/bin/env ruby
|
2
|
+
#
|
3
|
+
# This file is part of protk
|
4
|
+
# Created by Ira Cooke 24/6/2015
|
5
|
+
#
|
6
|
+
# Filters a pepxml file by removing or keeping only psms that match a filter
|
7
|
+
#
|
8
|
+
|
9
|
+
require 'protk/constants'
|
10
|
+
require 'protk/command_runner'
|
11
|
+
require 'protk/tool'
|
12
|
+
require 'bio'
|
13
|
+
require 'libxml'
|
14
|
+
|
15
|
+
include LibXML
|
16
|
+
|
17
|
+
tool=Tool.new([:explicit_output,:debug])
|
18
|
+
tool.option_parser.banner = "Filter psms in a pepxml file.\n\nUsage: filter_psms.rb [options] expression file.pepxml"
|
19
|
+
tool.add_value_option(:filter,"protein",['-A','--attribute name',"Match expression against a specific search_hit attribute"])
|
20
|
+
tool.add_boolean_option(:check_alternative_proteins,false,['-C','--check-alternatives',"Also match expression against to alternative_proteins"])
|
21
|
+
tool.add_boolean_option(:reject_mode,false,['-R','--reject',"Keep mismatches instead of matches"])
|
22
|
+
|
23
|
+
exit unless tool.check_options(true,[:filter])
|
24
|
+
|
25
|
+
if ARGV.length!=2
|
26
|
+
puts "Wrong number of arguments. You must supply a filter expression and a pepxml file"
|
27
|
+
exit(1)
|
28
|
+
end
|
29
|
+
|
30
|
+
expressions=ARGV[0].split(",").map(&:strip)
|
31
|
+
input_file=ARGV[1]
|
32
|
+
|
33
|
+
$protk = Constants.instance
|
34
|
+
log_level = tool.debug ? "info" : "warn"
|
35
|
+
$protk.info_level= log_level
|
36
|
+
|
37
|
+
|
38
|
+
output_fh = tool.explicit_output!=nil ? File.new("#{tool.explicit_output}",'w') : $stdout
|
39
|
+
|
40
|
+
throw "Input file #{input_file} does not exist" unless File.exist? "#{input_file}"
|
41
|
+
|
42
|
+
XML::Error.set_handler(&XML::Error::QUIET_HANDLER)
|
43
|
+
|
44
|
+
doc = XML::Document.file("#{input_file}")
|
45
|
+
reader = XML::Reader.document(doc)
|
46
|
+
|
47
|
+
|
48
|
+
# First print out the header (ie before spectrum_queries)
|
49
|
+
File.foreach("#{input_file}") do |line|
|
50
|
+
if line =~ /\<spectrum_query/
|
51
|
+
break;
|
52
|
+
else
|
53
|
+
output_fh.write line
|
54
|
+
end
|
55
|
+
end
|
56
|
+
|
57
|
+
pepxml_ns_prefix="xmlns:"
|
58
|
+
pepxml_ns="xmlns:http://regis-web.systemsbiology.net/pepXML"
|
59
|
+
|
60
|
+
kept=0
|
61
|
+
deleted=0
|
62
|
+
scanned=0
|
63
|
+
|
64
|
+
while reader.read
|
65
|
+
|
66
|
+
if reader.name == "spectrum_query"
|
67
|
+
sq_node = reader.expand
|
68
|
+
|
69
|
+
hits = sq_node.find("./#{pepxml_ns_prefix}search_result/#{pepxml_ns_prefix}search_hit[@hit_rank=\"1\"]",pepxml_ns)
|
70
|
+
|
71
|
+
throw "More than one first ranked search hit" if hits.length>1
|
72
|
+
throw "No search hit for spectrum_query" if hits.length==0
|
73
|
+
|
74
|
+
hit = hits[0]
|
75
|
+
|
76
|
+
has_match = expressions.collect { |expression| (hit.attributes[tool.filter] =~ /#{expression}/) }.any?
|
77
|
+
|
78
|
+
if !has_match && tool.check_alternative_proteins
|
79
|
+
alts = hit.find("./#{pepxml_ns_prefix}alternative_protein",pepxml_ns)
|
80
|
+
|
81
|
+
# Check alternative proteins
|
82
|
+
alt_expr = alts.collect { |alt| expressions.collect { |expression| (alt.attributes[tool.filter] =~ /#{expression}/ )}}
|
83
|
+
|
84
|
+
has_match = alt_expr.flatten.any?
|
85
|
+
end
|
86
|
+
|
87
|
+
if (has_match && !tool.reject_mode) || (!has_match && tool.reject_mode) #&& (hit.attributes['hit_rank']=="1")
|
88
|
+
kept+=1
|
89
|
+
|
90
|
+
# Remove any lower ranked hits
|
91
|
+
#
|
92
|
+
secondary_hits = sq_node.find("./#{pepxml_ns_prefix}search_result/#{pepxml_ns_prefix}search_hit[@hit_rank!=\"1\"]",pepxml_ns)
|
93
|
+
secondary_hits.each { |sh| sh.remove! }
|
94
|
+
|
95
|
+
output_fh.write "#{sq_node}\n"
|
96
|
+
else
|
97
|
+
deleted+=1
|
98
|
+
end
|
99
|
+
|
100
|
+
scanned+=1
|
101
|
+
|
102
|
+
reader.next_sibling
|
103
|
+
end
|
104
|
+
|
105
|
+
end
|
106
|
+
|
107
|
+
output_fh.write "</msms_run_summary>\n</msms_pipeline_analysis>\n"
|
108
|
+
|
109
|
+
$protk.log "Kept #{kept} and deleted #{deleted}" , :info
|
data/bin/msgfplus_search.rb
CHANGED
@@ -41,17 +41,17 @@ search_tool.options.instrument=0
|
|
41
41
|
|
42
42
|
# MS-GF+ doesnt support fragment tol so add this manually rather than via the SearchTool defaults
|
43
43
|
search_tool.add_value_option(:precursor_tol,"20",['-p','--precursor-ion-tol tol', 'Precursor ion mass tolerance.'])
|
44
|
-
search_tool.add_value_option(:precursor_tolu,"ppm",['--precursor-ion-tol-units tolu', 'Precursor ion mass tolerance units (ppm or Da).
|
44
|
+
search_tool.add_value_option(:precursor_tolu,"ppm",['--precursor-ion-tol-units tolu', 'Precursor ion mass tolerance units (ppm or Da).'])
|
45
45
|
|
46
46
|
search_tool.add_boolean_option(:pepxml,false,['--pepxml', 'Convert results to pepxml.'])
|
47
|
-
search_tool.add_value_option(:isotope_error_range,"0,1",['--isotope-error-range range', 'Takes into account of the error introduced by chooosing a non-monoisotopic peak for fragmentation.
|
47
|
+
search_tool.add_value_option(:isotope_error_range,"0,1",['--isotope-error-range range', 'Takes into account of the error introduced by chooosing a non-monoisotopic peak for fragmentation.'])
|
48
48
|
search_tool.add_value_option(:fragment_method,0,['--fragment-method method', 'Fragment method 0: As written in the spectrum or CID if no info (Default), 1: CID, 2: ETD, 3: HCD, 4: Merge spectra from the same precursor'])
|
49
49
|
search_tool.add_boolean_option(:decoy_search,false,['--decoy-search', 'Build and search a decoy database on the fly. Input db should not contain decoys if this option is used'])
|
50
50
|
search_tool.add_value_option(:protocol,0,['--protocol p', '0: NoProtocol (Default), 1: Phosphorylation, 2: iTRAQ, 3: iTRAQPhospho'])
|
51
|
-
search_tool.add_value_option(:min_pep_length,6,['--min-pep-length p', 'Minimum peptide length to consider
|
52
|
-
search_tool.add_value_option(:max_pep_length,40,['--max-pep-length p', 'Maximum peptide length to consider
|
53
|
-
search_tool.add_value_option(:min_pep_charge,2,['--min-pep-charge c', 'Minimum precursor charge to consider if charges are not specified in the spectrum file
|
54
|
-
search_tool.add_value_option(:max_pep_charge,3,['--max-pep-charge c', 'Maximum precursor charge to consider if charges are not specified in the spectrum file
|
51
|
+
search_tool.add_value_option(:min_pep_length,6,['--min-pep-length p', 'Minimum peptide length to consider'])
|
52
|
+
search_tool.add_value_option(:max_pep_length,40,['--max-pep-length p', 'Maximum peptide length to consider'])
|
53
|
+
search_tool.add_value_option(:min_pep_charge,2,['--min-pep-charge c', 'Minimum precursor charge to consider if charges are not specified in the spectrum file'])
|
54
|
+
search_tool.add_value_option(:max_pep_charge,3,['--max-pep-charge c', 'Maximum precursor charge to consider if charges are not specified in the spectrum file'])
|
55
55
|
search_tool.add_value_option(:num_reported_matches,1,['--num-reported-matches n', 'Number of matches per spectrum to be reported, Default: 1'])
|
56
56
|
search_tool.add_boolean_option(:add_features,false,['--add-features', 'output additional features'])
|
57
57
|
search_tool.add_value_option(:java_mem,"3500M",['--java-mem mem','Java memory limit when running the search (Default 3.5Gb)'])
|
@@ -208,23 +208,32 @@ ARGV.each do |filename|
|
|
208
208
|
cmd << " -mod #{mods_path}"
|
209
209
|
end
|
210
210
|
|
211
|
+
|
211
212
|
# As a final part of the command we convert to pepxml
|
212
213
|
if search_tool.pepxml
|
213
214
|
#if search_tool.explicit_output
|
214
215
|
cmd << ";ruby -pi.bak -e \"gsub('post=\\\"?','post=\\\"X')\" #{mzid_output_path}"
|
215
216
|
cmd << ";ruby -pi.bak -e \"gsub('pre=\\\"?','pre=\\\"X')\" #{mzid_output_path}"
|
216
|
-
cmd << ";
|
217
|
+
cmd << ";ruby -pi.bak -e \"gsub('id=\\\"UnspecificCleavage\\\"','id=\\\"UnspecificCleavage\\\" name=\\\"unspecific cleavage\\\"')\" #{mzid_output_path}"
|
218
|
+
|
219
|
+
idconvert_relative_output_dir = (0...10).map { ('a'..'z').to_a[rand(26)] }.join
|
220
|
+
|
221
|
+
# require 'byebug';byebug
|
222
|
+
|
223
|
+
idconvert_output_dir = "#{Pathname.new(mzid_output_path).dirname}/#{idconvert_relative_output_dir}"
|
224
|
+
cmd << ";idconvert #{mzid_output_path} --pepXML -o #{idconvert_output_dir}"
|
217
225
|
|
218
226
|
|
219
|
-
|
227
|
+
cmd << "; pep_xml_output_path=`ls #{idconvert_output_dir}/*.pepXML`; echo $pep_xml_output_path"
|
228
|
+
#"#{mzid_output_path.chomp('.mzid')}.pepXML"
|
220
229
|
|
221
230
|
# Fix the msms_run_summary base_name attribute
|
222
231
|
#
|
223
232
|
if for_galaxy
|
224
|
-
cmd << ";ruby -pi.bak -e \"gsub(/ base_name=[^ ]+/,' base_name=\\\"#{original_input_file}\\\"')\"
|
233
|
+
cmd << ";ruby -pi.bak -e \"gsub(/ base_name=[^ ]+/,' base_name=\\\"#{original_input_file}\\\"')\" $pep_xml_output_path"
|
225
234
|
end
|
226
235
|
#Then copy the pepxml to the final output path
|
227
|
-
cmd << "; mv
|
236
|
+
cmd << "; mv ${pep_xml_output_path} '#{output_path}'"
|
228
237
|
else
|
229
238
|
cmd << "; mv #{mzid_output_path} #{output_path}"
|
230
239
|
end
|
data/bin/mzid_to_protxml.rb
CHANGED
@@ -29,7 +29,7 @@ tool.option_parser.banner = "Convert an mzIdentML file to protXML.\n\nUsage: mzi
|
|
29
29
|
exit unless tool.check_options(true)
|
30
30
|
|
31
31
|
$protk = Constants.instance
|
32
|
-
log_level = tool.debug ? "
|
32
|
+
log_level = tool.debug ? "debug" : "info"
|
33
33
|
$protk.info_level= log_level
|
34
34
|
|
35
35
|
input_file=ARGV[0]
|
@@ -42,6 +42,7 @@ end
|
|
42
42
|
|
43
43
|
prot_xml_writer = ProtXMLWriter.new
|
44
44
|
|
45
|
+
$protk.log "Parsing MzIdentML input file" , :info
|
45
46
|
mzid_doc = MzIdentMLDoc.new(input_file)
|
46
47
|
|
47
48
|
protein_groups = mzid_doc.protein_groups
|
@@ -59,11 +60,13 @@ protein_groups.each do |group_node|
|
|
59
60
|
$stdout.write "Scanned #{i} and read #{n_written} of #{n_prots}\r"
|
60
61
|
end
|
61
62
|
|
62
|
-
# require 'byebug';byebug
|
63
|
-
group_prob = MzIdentMLDoc.get_cvParam(group_node,"MS:1002470").attributes['value'].to_f*0.01
|
64
63
|
|
65
|
-
|
66
|
-
|
64
|
+
group_prob = mzid_doc.get_cvParam(group_node,"MS:1002470").attributes['value'].to_f*0.01
|
65
|
+
|
66
|
+
if group_prob >= tool.minprob.to_f
|
67
|
+
$stdout.write "\n" if tool.debug
|
68
|
+
$protk.log "Writing group with probability #{group_prob}" , :info
|
69
|
+
group = ProteinGroup.from_mzid(group_node,mzid_doc,tool.minprob.to_f)
|
67
70
|
prot_xml_writer.append_protein_group(group.as_protxml)
|
68
71
|
n_written+=1
|
69
72
|
end
|
data/bin/peptide_prophet.rb
CHANGED
@@ -44,6 +44,7 @@ prophet_tool.add_boolean_option(:force_fit,false,['--force-fit',"Force fitting o
|
|
44
44
|
prophet_tool.add_boolean_option(:allow_alt_instruments,false,['--allow-alt-instruments',"Warning instead of exit with error if instrument types between runs is different"])
|
45
45
|
prophet_tool.add_boolean_option(:one_ata_time,false,['-F', '--one-ata-time', 'Create a separate pproph output file for each analysis'])
|
46
46
|
prophet_tool.add_value_option(:decoy_prefix,"decoy",['--decoy-prefix prefix', 'Prefix for decoy sequences'])
|
47
|
+
prophet_tool.add_boolean_option(:use_non_parametric_model,false,['--use-non-parametric-model', 'Use Non-parametric model, can only be used with decoy option'])
|
47
48
|
prophet_tool.add_boolean_option(:no_decoys,false,['--no-decoy', 'Don\'t use decoy sequences to pin down the negative distribution'])
|
48
49
|
prophet_tool.add_value_option(:experiment_label,nil,['--experiment-label label','used to commonly label all spectra belonging to one experiment (required by iProphet)'])
|
49
50
|
|
@@ -194,7 +195,7 @@ def generate_command(genv,prophet_tool,inputs,output,database,engine,enzyme)
|
|
194
195
|
if prophet_tool.useicat
|
195
196
|
cmd << " -Oi "
|
196
197
|
else
|
197
|
-
cmd << " -Of"
|
198
|
+
cmd << " -Of "
|
198
199
|
end
|
199
200
|
|
200
201
|
if prophet_tool.maldi
|
@@ -209,6 +210,10 @@ def generate_command(genv,prophet_tool,inputs,output,database,engine,enzyme)
|
|
209
210
|
cmd << " -d#{prophet_tool.decoy_prefix} -Od "
|
210
211
|
end
|
211
212
|
|
213
|
+
if prophet_tool.use_non_parametric_model
|
214
|
+
cmd << " -OP "
|
215
|
+
end
|
216
|
+
|
212
217
|
cmd << " -p#{prophet_tool.probability_threshold}"
|
213
218
|
|
214
219
|
if ( inputs.class==Array)
|
data/bin/protxml_to_gff.rb
CHANGED
@@ -58,6 +58,15 @@ def protein_id_to_protdbid(protein_id)
|
|
58
58
|
return protein_id
|
59
59
|
end
|
60
60
|
|
61
|
+
def protein_is_included(protein,protein_probability_threshold,ignore_regex)
|
62
|
+
pass_probability_thresh = (protein.probability >= protein_probability_threshold)
|
63
|
+
pass_regex = true
|
64
|
+
if ignore_regex && (protein.protein_name =~ /#{ignore_regex}/)
|
65
|
+
pass_regex=false
|
66
|
+
end
|
67
|
+
return (pass_regex && pass_probability_thresh)
|
68
|
+
end
|
69
|
+
|
61
70
|
def prepare_fasta(database_path,type)
|
62
71
|
db_filename = nil
|
63
72
|
case
|
@@ -91,6 +100,7 @@ tool.add_value_option(:peptide_probability_threshold,0.95,['--threshold prob','P
|
|
91
100
|
tool.add_value_option(:protein_probability_threshold,0.99,['--prot-threshold prob','Protein Probability Threshold (Default 0.99)'])
|
92
101
|
tool.add_value_option(:gff_idregex,nil,['--gff-idregex pre','Regex with capture group for parsing gff ids from protein ids'])
|
93
102
|
tool.add_value_option(:genome_idregex,nil,['--genome-idregex pre','Regex with capture group for parsing genomic ids from protein ids'])
|
103
|
+
tool.add_value_option(:ignore_regex,nil,['--ignore-regex pre','Regex to match protein ids that we should ignore completely'])
|
94
104
|
|
95
105
|
exit unless tool.check_options(true,[:database,:coords_file])
|
96
106
|
|
@@ -126,7 +136,7 @@ num_missing_gff_entries = 0
|
|
126
136
|
|
127
137
|
proteins.each do |protein|
|
128
138
|
|
129
|
-
if protein.
|
139
|
+
if protein_is_included(protein,tool.protein_probability_threshold.to_f,tool.ignore_regex)
|
130
140
|
|
131
141
|
begin
|
132
142
|
$protk.log "Mapping #{protein.protein_name}", :info
|
@@ -155,7 +165,7 @@ proteins.each do |protein|
|
|
155
165
|
peptides = tool.stack_charge_states ? protein.peptides : protein.representative_peptides
|
156
166
|
|
157
167
|
peptides.each do |peptide|
|
158
|
-
if peptide.probability >= tool.peptide_probability_threshold
|
168
|
+
if peptide.probability >= tool.peptide_probability_threshold.to_f
|
159
169
|
peptide_entries = peptide.to_gff3_records(protein_entry.aaseq,gff_parent_entry,gff_cds_entries)
|
160
170
|
peptide_entries.each do |peptide_entry|
|
161
171
|
output_fh.write peptide_entry.to_s
|
data/lib/protk/constants.rb
CHANGED
data/lib/protk/mzidentml_doc.rb
CHANGED
@@ -7,6 +7,31 @@ class MzIdentMLDoc < Object
|
|
7
7
|
MZID_NS_PREFIX="mzidentml"
|
8
8
|
MZID_NS='http://psidev.info/psi/pi/mzIdentML/1.1'
|
9
9
|
|
10
|
+
attr :psms_cache
|
11
|
+
attr :db_sequence_cache
|
12
|
+
|
13
|
+
def psms_cache
|
14
|
+
if !@psms_cache
|
15
|
+
@psms_cache={}
|
16
|
+
Constants.instance.log "Generating psm index" , :debug
|
17
|
+
self.psms.each do |spectrum_identification_item|
|
18
|
+
@psms_cache[spectrum_identification_item.attributes['id']]=spectrum_identification_item
|
19
|
+
end
|
20
|
+
end
|
21
|
+
@psms_cache
|
22
|
+
end
|
23
|
+
|
24
|
+
def dbsequence_cache
|
25
|
+
if !@dbsequence_cache
|
26
|
+
@dbsequence_cache={}
|
27
|
+
Constants.instance.log "Generating DB index" , :debug
|
28
|
+
self.dbsequences.each do |db_sequence|
|
29
|
+
@dbsequence_cache[db_sequence.attributes['accession']]=db_sequence
|
30
|
+
end
|
31
|
+
end
|
32
|
+
@dbsequence_cache
|
33
|
+
end
|
34
|
+
|
10
35
|
def initialize(path)
|
11
36
|
parser=XML::Parser.file(path)
|
12
37
|
@document=parser.parse
|
@@ -25,6 +50,10 @@ class MzIdentMLDoc < Object
|
|
25
50
|
@document.find("//#{MZID_NS_PREFIX}:SpectrumIdentificationItem","#{MZID_NS_PREFIX}:#{MZID_NS}")
|
26
51
|
end
|
27
52
|
|
53
|
+
def dbsequences
|
54
|
+
@document.find("//#{MZID_NS_PREFIX}:DBSequence","#{MZID_NS_PREFIX}:#{MZID_NS}")
|
55
|
+
end
|
56
|
+
|
28
57
|
def protein_groups
|
29
58
|
@document.find("//#{MZID_NS_PREFIX}:ProteinAmbiguityGroup","#{MZID_NS_PREFIX}:#{MZID_NS}")
|
30
59
|
end
|
@@ -55,17 +84,22 @@ class MzIdentMLDoc < Object
|
|
55
84
|
node.find("#{pp}#{MZID_NS_PREFIX}:#{expression}","#{MZID_NS_PREFIX}:#{MZID_NS}")
|
56
85
|
end
|
57
86
|
|
87
|
+
def find(node,expression,root=false)
|
88
|
+
MzIdentMLDoc.find(node,expression,root)
|
89
|
+
end
|
90
|
+
|
58
91
|
|
59
|
-
def
|
92
|
+
def get_cvParam(mzidnode,accession)
|
60
93
|
self.find(mzidnode,"cvParam[@accession=\'#{accession}\']")[0]
|
61
94
|
end
|
62
95
|
|
63
|
-
def
|
64
|
-
self.
|
96
|
+
def get_dbsequence(mzidnode,accession)
|
97
|
+
self.dbsequence_cache[accession]
|
98
|
+
# self.find(mzidnode,"DBSequence[@accession=\'#{accession}\']",true)[0]
|
65
99
|
end
|
66
100
|
|
67
101
|
# As per PeptideShaker. Assume group probability used for protein if it is group rep otherwise 0
|
68
|
-
def
|
102
|
+
def get_protein_probability(protein_node)
|
69
103
|
|
70
104
|
#MS:1002403
|
71
105
|
is_group_representative=(self.get_cvParam(protein_node,"MS:1002403")!=nil)
|
@@ -76,28 +110,38 @@ class MzIdentMLDoc < Object
|
|
76
110
|
end
|
77
111
|
end
|
78
112
|
|
79
|
-
|
80
|
-
|
113
|
+
# Memoized because it gets called for every protein in a group
|
114
|
+
def get_proteins_for_group(group_node)
|
115
|
+
@proteins_for_group_cache ||= Hash.new do |h,key|
|
116
|
+
h[key] = self.find(group_node,"ProteinDetectionHypothesis")
|
117
|
+
end
|
118
|
+
@proteins_for_group_cache[group_node]
|
81
119
|
end
|
82
120
|
|
83
121
|
# def self.get_sister_proteins(protein_node)
|
84
122
|
# self.find(protein_node.parent,"ProteinDetectionHypothesis")
|
85
123
|
# end
|
86
124
|
|
87
|
-
def
|
125
|
+
def get_peptides_for_protein(protein_node)
|
88
126
|
self.find(protein_node,"PeptideHypothesis")
|
89
127
|
end
|
90
128
|
|
91
129
|
# <PeptideHypothesis peptideEvidence_ref="PepEv_1">
|
92
130
|
# <SpectrumIdentificationItemRef spectrumIdentificationItem_ref="SII_1_1"/>
|
93
131
|
# </PeptideHypothesis>
|
94
|
-
def
|
132
|
+
def get_best_psm_for_peptide(peptide_node)
|
133
|
+
|
134
|
+
|
95
135
|
|
96
136
|
best_score=-1
|
97
137
|
best_psm=nil
|
98
|
-
self.find(peptide_node,"SpectrumIdentificationItemRef")
|
138
|
+
spectrumidrefs = self.find(peptide_node,"SpectrumIdentificationItemRef")
|
139
|
+
Constants.instance.log "Searching from among #{spectrumidrefs.length} for best psm" , :debug
|
140
|
+
|
141
|
+
spectrumidrefs.each do |id_ref_node|
|
99
142
|
id_ref = id_ref_node.attributes['spectrumIdentificationItem_ref']
|
100
|
-
psm_node = self.find(peptide_node,"SpectrumIdentificationItem[@id=\'#{id_ref}\']",true)[0]
|
143
|
+
# psm_node = self.find(peptide_node,"SpectrumIdentificationItem[@id=\'#{id_ref}\']",true)[0]
|
144
|
+
psm_node = self.psms_cache[id_ref]
|
101
145
|
score = self.get_cvParam(psm_node,"MS:1002466")['value'].to_f
|
102
146
|
if score>best_score
|
103
147
|
best_psm=psm_node
|
@@ -107,7 +151,7 @@ class MzIdentMLDoc < Object
|
|
107
151
|
best_psm
|
108
152
|
end
|
109
153
|
|
110
|
-
def
|
154
|
+
def get_sequence_for_peptide(peptide_node)
|
111
155
|
evidence_ref = peptide_node.attributes['peptideEvidence_ref']
|
112
156
|
pep_ref = peptide_node.find("//#{MZID_NS_PREFIX}:PeptideEvidence[@id=\'#{evidence_ref}\']","#{MZID_NS_PREFIX}:#{MZID_NS}")[0].attributes['peptide_ref']
|
113
157
|
peptide=peptide_node.find("//#{MZID_NS_PREFIX}:Peptide[@id=\'#{pep_ref}\']","#{MZID_NS_PREFIX}:#{MZID_NS}")[0]
|
@@ -115,13 +159,13 @@ class MzIdentMLDoc < Object
|
|
115
159
|
peptide.find("./#{MZID_NS_PREFIX}:PeptideSequence","#{MZID_NS_PREFIX}:#{MZID_NS}")[0].content
|
116
160
|
end
|
117
161
|
|
118
|
-
def
|
162
|
+
def get_sequence_for_psm(psm_node)
|
119
163
|
pep_ref = psm_node.attributes['peptide_ref']
|
120
164
|
peptide=psm_node.find("//#{MZID_NS_PREFIX}:Peptide[@id=\'#{pep_ref}\']","#{MZID_NS_PREFIX}:#{MZID_NS}")[0]
|
121
165
|
peptide.find("./#{MZID_NS_PREFIX}:PeptideSequence","#{MZID_NS_PREFIX}:#{MZID_NS}")[0].content
|
122
166
|
end
|
123
167
|
|
124
|
-
def
|
168
|
+
def get_peptide_evidence_from_psm(psm_node)
|
125
169
|
pe_nodes = []
|
126
170
|
self.find(psm_node,"PeptideEvidenceRef").each do |pe_node|
|
127
171
|
ev_id=pe_node.attributes['peptideEvidence_ref']
|
data/lib/protk/peptide.rb
CHANGED
@@ -45,15 +45,15 @@ class Peptide
|
|
45
45
|
# <cvParam cvRef="PSI-MS" accession="MS:1001093" name="sequence coverage" value="0.0"/>
|
46
46
|
# </ProteinDetectionHypothesis>
|
47
47
|
|
48
|
-
def from_mzid(xmlnode)
|
48
|
+
def from_mzid(xmlnode,mzid_doc)
|
49
49
|
pep=new()
|
50
|
-
pep.sequence=
|
51
|
-
best_psm =
|
50
|
+
pep.sequence=mzid_doc.get_sequence_for_peptide(xmlnode)
|
51
|
+
best_psm = mzid_doc.get_best_psm_for_peptide(xmlnode)
|
52
52
|
# require 'byebug';byebug
|
53
|
-
pep.probability =
|
54
|
-
pep.theoretical_neutral_mass =
|
53
|
+
pep.probability = mzid_doc.get_cvParam(best_psm,"MS:1002466")['value'].to_f
|
54
|
+
pep.theoretical_neutral_mass = mzid_doc.get_cvParam(best_psm,"MS:1001117")['value'].to_f
|
55
55
|
pep.charge = best_psm.attributes['chargeState'].to_i
|
56
|
-
pep.protein_name =
|
56
|
+
pep.protein_name = mzid_doc.get_dbsequence(xmlnode.parent,xmlnode.parent.attributes['dBSequence_ref']).attributes['accession']
|
57
57
|
|
58
58
|
# pep.charge = MzIdentMLDoc.get_charge_for_psm(best_psm)
|
59
59
|
|
data/lib/protk/prophet_tool.rb
CHANGED
data/lib/protk/protein.rb
CHANGED
@@ -84,26 +84,33 @@ class Protein
|
|
84
84
|
# This is hacked together to work for a specific PeptideShaker output type
|
85
85
|
# Refactor and properly respect cvParams for real conversion
|
86
86
|
#
|
87
|
-
def from_mzid(xmlnode)
|
87
|
+
def from_mzid(xmlnode,mzid_doc)
|
88
88
|
|
89
89
|
coverage_cvparam=""
|
90
90
|
prot=new()
|
91
91
|
groupnode = xmlnode.parent
|
92
92
|
|
93
93
|
prot.group_number=groupnode.attributes['id'].split("_").last.to_i+1
|
94
|
-
prot.protein_name=
|
95
|
-
prot.n_indistinguishable_proteins=MzIdentMLDoc.get_proteins_for_group(groupnode).length
|
96
|
-
prot.group_probability=MzIdentMLDoc.get_cvParam(groupnode,"MS:1002470").attributes['value'].to_f
|
94
|
+
prot.protein_name=mzid_doc.get_dbsequence(xmlnode,xmlnode.attributes['dBSequence_ref']).attributes['accession']
|
97
95
|
|
98
|
-
|
96
|
+
prot.n_indistinguishable_proteins=mzid_doc.get_proteins_for_group(groupnode).length
|
97
|
+
|
98
|
+
|
99
|
+
prot.group_probability=mzid_doc.get_cvParam(groupnode,"MS:1002470").attributes['value'].to_f
|
100
|
+
|
101
|
+
|
102
|
+
coverage_node=mzid_doc.get_cvParam(xmlnode,"MS:1001093")
|
99
103
|
|
100
104
|
prot.percent_coverage=coverage_node.attributes['value'].to_f if coverage_node
|
101
|
-
prot.probability =
|
105
|
+
prot.probability = mzid_doc.get_protein_probability(xmlnode)
|
102
106
|
# require 'byebug';byebug
|
103
107
|
|
104
|
-
peptide_nodes=
|
108
|
+
peptide_nodes=mzid_doc.get_peptides_for_protein(xmlnode)
|
109
|
+
|
110
|
+
prot.peptides = peptide_nodes.collect { |e| Peptide.from_mzid(e,mzid_doc) }
|
111
|
+
|
112
|
+
Constants.instance.log "Generated protein entry with probability #{prot.probability}" , :debug
|
105
113
|
|
106
|
-
prot.peptides = peptide_nodes.collect { |e| Peptide.from_mzid(e) }
|
107
114
|
prot
|
108
115
|
end
|
109
116
|
|
data/lib/protk/protein_group.rb
CHANGED
@@ -35,18 +35,25 @@ class ProteinGroup
|
|
35
35
|
# This is hacked together to work for a specific PeptideShaker output type
|
36
36
|
# Refactor and properly respect cvParams for real conversion
|
37
37
|
#
|
38
|
-
def from_mzid(groupnode)
|
38
|
+
def from_mzid(groupnode,mzid_doc,minprob=0)
|
39
39
|
|
40
40
|
group=new()
|
41
41
|
|
42
42
|
group.group_number=groupnode.attributes['id'].split("_").last.to_i+1
|
43
|
-
group.group_probability=
|
43
|
+
group.group_probability=mzid_doc.get_cvParam(groupnode,"MS:1002470").attributes['value'].to_f
|
44
44
|
|
45
45
|
# require 'byebug';byebug
|
46
46
|
|
47
|
-
protein_nodes=
|
47
|
+
protein_nodes=mzid_doc.get_proteins_for_group(groupnode)
|
48
|
+
|
49
|
+
|
50
|
+
|
51
|
+
group_members = protein_nodes.select do |e|
|
52
|
+
mzid_doc.get_protein_probability(e)>=minprob
|
53
|
+
end
|
54
|
+
|
55
|
+
group.proteins = group_members.collect { |e| Protein.from_mzid(e,mzid_doc) }
|
48
56
|
|
49
|
-
group.proteins = protein_nodes.collect { |e| Protein.from_mzid(e) }
|
50
57
|
group
|
51
58
|
end
|
52
59
|
|
data/lib/protk/psm.rb
CHANGED
@@ -26,7 +26,7 @@ class PeptideEvidence
|
|
26
26
|
# dBSequence_ref="JEMP01000193.1_rev_g3500.t1" id="PepEv_1" />
|
27
27
|
class << self
|
28
28
|
|
29
|
-
def from_mzid(pe_node)
|
29
|
+
def from_mzid(pe_node,mzid_doc)
|
30
30
|
pe = new()
|
31
31
|
pe.peptide_prev_aa=pe_node.attributes['pre']
|
32
32
|
pe.peptide_next_aa=pe_node.attributes['post']
|
@@ -45,7 +45,7 @@ class PeptideEvidence
|
|
45
45
|
# name="protein description" value="280755|283436" />
|
46
46
|
# </DBSequence>
|
47
47
|
pe.protein=prot_node.attributes['accession']
|
48
|
-
pe.protein_descr=
|
48
|
+
pe.protein_descr=mzid_doc.get_cvParam(prot_node,"MS:1001088")['value']
|
49
49
|
|
50
50
|
|
51
51
|
# pe.peptide_sequence=pep_node
|
@@ -163,11 +163,11 @@ class PSM
|
|
163
163
|
|
164
164
|
|
165
165
|
|
166
|
-
def from_mzid(psm_node)
|
166
|
+
def from_mzid(psm_node,mzid_doc)
|
167
167
|
psm = new()
|
168
|
-
psm.peptide =
|
169
|
-
peptide_evidence_nodes =
|
170
|
-
psm.peptide_evidence = peptide_evidence_nodes.collect { |pe| PeptideEvidence.from_mzid(pe) }
|
168
|
+
psm.peptide = mzid_doc.get_sequence_for_psm(psm_node)
|
169
|
+
peptide_evidence_nodes = mzid_doc.get_peptide_evidence_from_psm(psm_node)
|
170
|
+
psm.peptide_evidence = peptide_evidence_nodes.collect { |pe| PeptideEvidence.from_mzid(pe,mzid_doc) }
|
171
171
|
|
172
172
|
psm.calculated_mz = psm_node.attributes['calculatedMassToCharge'].to_f
|
173
173
|
psm.experimental_mz = psm_node.attributes['experimentalMassToCharge'].to_f
|
data/lib/protk/search_tool.rb
CHANGED
@@ -34,13 +34,13 @@ class SearchTool < Tool
|
|
34
34
|
end
|
35
35
|
|
36
36
|
if ( option_support.include? :mass_tolerance_units )
|
37
|
-
add_value_option(:fragment_tolu,"Da",['--fragment-ion-tol-units tolu', 'Fragment ion mass tolerance units (Da or mmu).
|
38
|
-
add_value_option(:precursor_tolu,"ppm",['--precursor-ion-tol-units tolu', 'Precursor ion mass tolerance units (ppm or Da).
|
37
|
+
add_value_option(:fragment_tolu,"Da",['--fragment-ion-tol-units tolu', 'Fragment ion mass tolerance units (Da or mmu).'])
|
38
|
+
add_value_option(:precursor_tolu,"ppm",['--precursor-ion-tol-units tolu', 'Precursor ion mass tolerance units (ppm or Da).'])
|
39
39
|
end
|
40
40
|
|
41
41
|
if ( option_support.include? :mass_tolerance )
|
42
|
-
add_value_option(:fragment_tol,0.65,['-f', '--fragment-ion-tol tol', 'Fragment ion mass tolerance (unit dependent).
|
43
|
-
add_value_option(:precursor_tol,200,['-p','--precursor-ion-tol tol', 'Precursor ion mass tolerance.
|
42
|
+
add_value_option(:fragment_tol,0.65,['-f', '--fragment-ion-tol tol', 'Fragment ion mass tolerance (unit dependent).'])
|
43
|
+
add_value_option(:precursor_tol,200,['-p','--precursor-ion-tol tol', 'Precursor ion mass tolerance.'])
|
44
44
|
end
|
45
45
|
|
46
46
|
if ( option_support.include? :precursor_search_type )
|
@@ -64,7 +64,7 @@ class SearchTool < Tool
|
|
64
64
|
end
|
65
65
|
|
66
66
|
if ( option_support.include? :searched_ions )
|
67
|
-
add_value_option(:searched_ions,"",['--searched-ions si', 'Ion series to search
|
67
|
+
add_value_option(:searched_ions,"",['--searched-ions si', 'Ion series to search'])
|
68
68
|
end
|
69
69
|
|
70
70
|
if ( option_support.include? :multi_isotope_search )
|
data/lib/protk/spectrum_query.rb
CHANGED
@@ -86,12 +86,12 @@ class SpectrumQuery
|
|
86
86
|
# unitAccession="UO:0000010" unitName="seconds" />
|
87
87
|
# </SpectrumIdentificationResult>
|
88
88
|
|
89
|
-
def from_mzid(query_node)
|
89
|
+
def from_mzid(query_node,mzid_doc)
|
90
90
|
query = new()
|
91
|
-
query.spectrum_title =
|
92
|
-
query.retention_time =
|
91
|
+
query.spectrum_title = mzid_doc.get_cvParam(query_node,"MS:1000796")['value'].to_s
|
92
|
+
query.retention_time = mzid_doc.get_cvParam(query_node,"MS:1000894")['value'].to_f
|
93
93
|
items = MzIdentMLDoc.find(query_node,"SpectrumIdentificationItem")
|
94
|
-
query.psms = items.collect { |item| PSM.from_mzid(item) }
|
94
|
+
query.psms = items.collect { |item| PSM.from_mzid(item,mzid_doc) }
|
95
95
|
query
|
96
96
|
end
|
97
97
|
|
data/lib/protk/tool.rb
CHANGED
@@ -26,8 +26,8 @@ class Tool
|
|
26
26
|
# Options set from the command-line
|
27
27
|
#
|
28
28
|
attr :options, false
|
29
|
-
|
30
|
-
# The option parser used to parse command-line options.
|
29
|
+
|
30
|
+
# The option parser used to parse command-line options.
|
31
31
|
#
|
32
32
|
attr :option_parser
|
33
33
|
|
@@ -62,19 +62,27 @@ class Tool
|
|
62
62
|
super
|
63
63
|
end
|
64
64
|
end
|
65
|
-
|
66
|
-
|
67
|
-
|
65
|
+
|
66
|
+
def add_default_to_help(default_value,opts)
|
67
|
+
if default_value!=nil && default_value!=" " && default_value!=""
|
68
|
+
opts[-1] = "#{opts.last} [#{default_value.to_s}]"
|
69
|
+
end
|
70
|
+
opts
|
71
|
+
end
|
72
|
+
|
73
|
+
def add_value_option(symbol,default_value,opts)
|
68
74
|
@options[symbol]=default_value
|
75
|
+
opts=add_default_to_help(default_value,opts)
|
69
76
|
@option_parser.on(*opts) do |val|
|
70
77
|
@options[symbol]=val
|
71
78
|
@options_defined_by_user[symbol]=opts
|
72
79
|
end
|
73
80
|
end
|
74
|
-
|
81
|
+
|
75
82
|
def add_boolean_option(symbol,default_value,opts)
|
76
83
|
@options[symbol]=default_value
|
77
|
-
|
84
|
+
opts=add_default_to_help(default_value,opts)
|
85
|
+
@option_parser.on(*opts) do
|
78
86
|
@options[symbol]=!default_value
|
79
87
|
@options_defined_by_user[symbol]=opts
|
80
88
|
end
|
@@ -92,10 +100,10 @@ class Tool
|
|
92
100
|
options.encoding = "utf8"
|
93
101
|
options.transfer_type = :auto
|
94
102
|
options.verbose = false
|
95
|
-
|
103
|
+
|
96
104
|
@options_defined_by_user={}
|
97
105
|
|
98
|
-
@option_parser=OptionParser.new do |opts|
|
106
|
+
@option_parser=OptionParser.new do |opts|
|
99
107
|
|
100
108
|
opts.on( '-h', '--help', 'Display this screen' ) do
|
101
109
|
puts opts
|
@@ -108,7 +116,7 @@ class Tool
|
|
108
116
|
end
|
109
117
|
|
110
118
|
if ( option_support.include? :over_write)
|
111
|
-
add_boolean_option(:over_write,false,['-r', '--replace-output', 'Dont skip analyses for which the output file already exists'])
|
119
|
+
add_boolean_option(:over_write,false,['-r', '--replace-output', 'Dont skip analyses for which the output file already exists'])
|
112
120
|
end
|
113
121
|
|
114
122
|
if ( option_support.include? :explicit_output )
|
@@ -120,7 +128,7 @@ class Tool
|
|
120
128
|
end
|
121
129
|
|
122
130
|
if ( option_support.include? :database)
|
123
|
-
add_value_option(:database,"sphuman",['-d', '--database dbname', 'Specify the database to use for this search. Can be a named protk database or the path to a fasta file'])
|
131
|
+
add_value_option(:database,"sphuman",['-d', '--database dbname', 'Specify the database to use for this search. Can be a named protk database or the path to a fasta file'])
|
124
132
|
end
|
125
133
|
|
126
134
|
if (option_support.include? :debug)
|
@@ -169,37 +177,37 @@ class Tool
|
|
169
177
|
return true
|
170
178
|
end
|
171
179
|
missing = mandatory.select{ |param| self.send(param).nil? }
|
172
|
-
if not missing.empty?
|
173
|
-
puts "Missing options: #{missing.join(', ')}"
|
174
|
-
puts self.option_parser
|
175
|
-
return false
|
176
|
-
end
|
177
|
-
rescue OptionParser::InvalidOption, OptionParser::MissingArgument
|
178
|
-
puts $!.to_s
|
179
|
-
puts self.option_parser
|
180
|
-
return false
|
180
|
+
if not missing.empty?
|
181
|
+
puts "Missing options: #{missing.join(', ')}"
|
182
|
+
puts self.option_parser
|
183
|
+
return false
|
184
|
+
end
|
185
|
+
rescue OptionParser::InvalidOption, OptionParser::MissingArgument
|
186
|
+
puts $!.to_s
|
187
|
+
puts self.option_parser
|
188
|
+
return false
|
181
189
|
end
|
182
190
|
|
183
191
|
if ( require_input_file && ARGV[0].nil? )
|
184
192
|
puts "You must supply an input file"
|
185
|
-
puts self.option_parser
|
193
|
+
puts self.option_parser
|
186
194
|
return false
|
187
195
|
end
|
188
196
|
|
189
197
|
return true
|
190
|
-
end
|
191
|
-
|
198
|
+
end
|
199
|
+
|
192
200
|
# Run the search tool using the given command string and global environment
|
193
201
|
#
|
194
202
|
def run(cmd,genv,autodelete=true)
|
195
203
|
cmd_runner=CommandRunner.new(genv)
|
196
204
|
cmd_runner.run_local(cmd)
|
197
205
|
end
|
198
|
-
|
206
|
+
|
199
207
|
|
200
208
|
def database_info
|
201
209
|
case
|
202
|
-
when Pathname.new(@options.database).exist? # It's an explicitly named db
|
210
|
+
when Pathname.new(@options.database).exist? # It's an explicitly named db
|
203
211
|
db_path=Pathname.new(@options.database).expand_path.to_s
|
204
212
|
db_name=Pathname.new(@options.database).basename.to_s
|
205
213
|
else
|
@@ -211,4 +219,4 @@ class Tool
|
|
211
219
|
|
212
220
|
|
213
221
|
|
214
|
-
end
|
222
|
+
end
|
metadata
CHANGED
@@ -1,14 +1,14 @@
|
|
1
1
|
--- !ruby/object:Gem::Specification
|
2
2
|
name: protk
|
3
3
|
version: !ruby/object:Gem::Version
|
4
|
-
version: 1.4.
|
4
|
+
version: 1.4.3
|
5
5
|
platform: ruby
|
6
6
|
authors:
|
7
7
|
- Ira Cooke
|
8
8
|
autorequire:
|
9
9
|
bindir: bin
|
10
10
|
cert_chain: []
|
11
|
-
date: 2015-
|
11
|
+
date: 2015-10-21 00:00:00.000000000 Z
|
12
12
|
dependencies:
|
13
13
|
- !ruby/object:Gem::Dependency
|
14
14
|
name: open4
|
@@ -36,7 +36,7 @@ dependencies:
|
|
36
36
|
requirements:
|
37
37
|
- - ~>
|
38
38
|
- !ruby/object:Gem::Version
|
39
|
-
version:
|
39
|
+
version: 1.4.3
|
40
40
|
- - '>='
|
41
41
|
- !ruby/object:Gem::Version
|
42
42
|
version: 1.4.3
|
@@ -46,7 +46,7 @@ dependencies:
|
|
46
46
|
requirements:
|
47
47
|
- - ~>
|
48
48
|
- !ruby/object:Gem::Version
|
49
|
-
version:
|
49
|
+
version: 1.4.3
|
50
50
|
- - '>='
|
51
51
|
- !ruby/object:Gem::Version
|
52
52
|
version: 1.4.3
|
@@ -210,6 +210,7 @@ executables:
|
|
210
210
|
- mzid_to_pepxml.rb
|
211
211
|
- spectrast_create.rb
|
212
212
|
- spectrast_filter.rb
|
213
|
+
- filter_psms.rb
|
213
214
|
extensions:
|
214
215
|
- ext/decoymaker/extconf.rb
|
215
216
|
extra_rdoc_files: []
|
@@ -217,6 +218,8 @@ files:
|
|
217
218
|
- README.md
|
218
219
|
- bin/add_retention_times.rb
|
219
220
|
- bin/augustus_to_proteindb.rb
|
221
|
+
- bin/filter_fasta.rb
|
222
|
+
- bin/filter_psms.rb
|
220
223
|
- bin/interprophet.rb
|
221
224
|
- bin/make_decoy.rb
|
222
225
|
- bin/manage_db.rb
|