protk 1.4.2 → 1.4.3

Sign up to get free protection for your applications and to get access to all the features.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA1:
3
- metadata.gz: 7377c1480498f852b7e747d13e9a7d985523fcef
4
- data.tar.gz: 2cb2c652e53ec636fb521cb35a687324ee810af8
3
+ metadata.gz: 31df514a2203236ea9ac25f8d5cc9c282378e04d
4
+ data.tar.gz: f1bb438ef01003afc166eb5b0342dbd8f6ecd09b
5
5
  SHA512:
6
- metadata.gz: c4e72457cc9ada490ea6210c9d13e6e5d240c0d19399c5b015feb30cebb881b51bae62cb8d4aa831d8aee397a747d5c16b1433f0ea08656474a56270f709b3d7
7
- data.tar.gz: e8893c4fda75666fdf4ed3cb6d6dc0bb34ceb3041c2b19883e82891aa5245a4a7aa99cc4fe677f2fb6f28c2b12d5bbb89a213a59e3f1f0e4d8dc927a9f6ff510
6
+ metadata.gz: 7f7f0fe81411f17b89037162ad7bf5374be69309888bda2f84b058d69773dd1790469d40dd5797e0306ce49ecc95426cfb80b65bfcb95ac0a052be90db40ea42
7
+ data.tar.gz: f2db6018ac90079e925f7c5c071be3fbedd77bc3787cfc58a91aa38048f43f60426020a3bff3a5a58c4bea6d71aaad7c0ab10437e050eaf1998792ca6ab2e1dd
data/README.md CHANGED
@@ -3,8 +3,6 @@
3
3
 
4
4
  # protk ( Proteomics toolkit )
5
5
 
6
-
7
- ***
8
6
  ## What is it?
9
7
 
10
8
  Protk is a suite of tools for proteomics. It aims to present a simple and consistent command-line interface across otherwise disparate third party tools. The following analysis tasks are currently supported;
@@ -68,15 +66,27 @@ By default protk will install tools and databases into `.protk` in your home dir
68
66
 
69
67
  Many protk tools have equivalent galaxy wrappers available on the [galaxy toolshed](http://toolshed.g2.bx.psu.edu/) with source code and development occuring in the [protk-galaxytools](github.com/iracooke/protk-galaxytools) repository on github. In order for these tools to work you will also need to make sure that protk, as well as the necessary third party dependencies are available to galaxy during tool execution.
70
68
 
71
- There are two ways to do this
69
+ There are three ways to do this
72
70
 
73
71
  **Using Docker:**
74
72
 
75
73
  By far the easiest way to do this is to set up your Galaxy instance to run tools in Docker containers. All the tools in the [protk-galaxytools](github.com/iracooke/protk-galaxytools) repository are designed to work with [this](https://github.com/iracooke/protk-dockerfile) docker image, and will download and use the image automatically on apprioriately configured Galaxy instances.
76
74
 
75
+ **Using the Galaxy Tool Shed (Experimental)**
76
+
77
+ An installation recipe of `protk` is available from the [Galaxy Tool Shed](https://testtoolshed.g2.bx.psu.edu/view/iuc/package_protk_1_4_2/). If you want to depend on protk for your own Galaxy wrapper create a `tool_dependencies.xml` file with the following content.
78
+
79
+ ```xml
80
+ <tool_dependency>
81
+ <package name="protk" version="1.4.2">
82
+ <repository name="package_protk_1_4_2" owner="iuc"/>
83
+ </package>
84
+ </tool_dependency>
85
+ ```
86
+
77
87
  **Manual Install**
78
88
 
79
- If your galaxy instance is unable to use Docker for some reason you will need to install `protk` and its dependencies manually.
89
+ If your galaxy instance is unable to use Docker or the Tool Shed for some reason you will need to install `protk` and its dependencies manually.
80
90
 
81
91
  One way to install protk would be to just do `gem install protk` using the default system ruby (without rvm). This will probably just work, however you will lose the ability to run specific versions of tools against specific versions of protk. The recommended method of installing protk for use with galaxy is as follows;
82
92
 
@@ -98,13 +108,13 @@ One way to install protk would be to just do `gem install protk` using the defau
98
108
 
99
109
  4. Install protk in an isolated gemset using rvm.
100
110
 
101
- This sets up an isolated environment where only a specific version of protk is available. We name the environment according to the protk intermediate version numer (1.4 in this example). Minor bugfixes will be released as 1.4.x and can be installed without updating the toolshed wrappers
111
+ This sets up an isolated environment where only a specific version of protk is available. We name the environment according to the protk version number (1.4.2 in this example).
102
112
 
103
113
  ```bash
104
114
  rvm 2.1
105
- rvm gemset create protk1.4
106
- rvm use 2.1@protk1.4
107
- gem install protk -v '~>1.4'
115
+ rvm gemset create protk1.4.2
116
+ rvm use 2.1@protk1.4.2
117
+ gem install protk -v '~>1.4.2'
108
118
  ```
109
119
 
110
120
  5. Configure Galaxy's tool dependency directory.
@@ -124,11 +134,11 @@ One way to install protk would be to just do `gem install protk` using the defau
124
134
  cd <tool_dependency_dir>
125
135
  mkdir protk
126
136
  cd protk
127
- mkdir 1.4
128
- ln -s 1.4 default
129
- rvm use 2.1@protk1.4
130
- rvmenv=`rvm env --path 2.1@protk1.4`
131
- echo ". $rvmenv" > 1.4/env.sh
137
+ mkdir 1.4.2
138
+ ln -s 1.4.2 default
139
+ rvm use 2.1@protk1.4.2
140
+ rvmenv=`rvm env --path 2.1@protk1.4.2`
141
+ echo ". $rvmenv" > 1.4.2/env.sh
132
142
  ```
133
143
 
134
144
  7. Keep things up to date
@@ -137,14 +147,14 @@ One way to install protk would be to just do `gem install protk` using the defau
137
147
 
138
148
  ```bash
139
149
  rvm 2.1
140
- rvm gemset create protk1.5
141
- rvm use 2.1@protk1.5
142
- gem install protk -v '~>1.5'
150
+ rvm gemset create protk1.5.0
151
+ rvm use 2.1@protk1.5.0
152
+ gem install protk -v '~>1.5.0'
143
153
  cd <tool_dependency_dir>/protk/
144
- mkdir 1.5
145
- rvmenv=`rvm env --path 2.1@protk1.5`
146
- echo ". $rvmenv" > 1.5/env.sh
147
- ln -s 1.5 default
154
+ mkdir 1.5.0
155
+ rvmenv=`rvm env --path 2.1@protk1.5.0`
156
+ echo ". $rvmenv" > 1.5.0/env.sh
157
+ ln -s 1.5.0 default
148
158
  ```
149
159
 
150
160
  ## Sequence databases
@@ -166,3 +176,4 @@ You should now be able to run database searches, specifying this database by usi
166
176
  manage_db.rb add --ftp-source 'ftp://ftp.uniprot.org/pub/databases/uniprot/current_release/knowledgebase/complete/uniprot_sprot.fasta.gz ftp://ftp.uniprot.org/pub/databases/uniprot/current_release/knowledgebase/complete/reldate.txt' --include-filters '/OS=Homo\ssapiens/' --id-regex 'sp\|.*\|(.*?)\s' --add-decoys --make-blast-index --archive-old sphuman
167
177
  ```
168
178
 
179
+
@@ -0,0 +1,74 @@
1
+ #!/usr/bin/env ruby
2
+ #
3
+ # This file is part of protk
4
+ # Created by Ira Cooke 22/5/2015
5
+ #
6
+ # Filters a fasta file so only entries matching a condition are emitted
7
+ #
8
+
9
+ require 'protk/constants'
10
+ require 'protk/command_runner'
11
+ require 'protk/tool'
12
+ require 'set'
13
+ require 'bio'
14
+
15
+ tool=Tool.new([:explicit_output])
16
+ tool.option_parser.banner = "Filter entries in a fasta file.\n\nUsage: filter_fasta.rb [options] file.fasta file2.fasta"
17
+ tool.add_value_option(:definition_filter,nil,['--definition filter','Keep entries matching definition'])
18
+ tool.add_boolean_option(:invert,false,['--invert',"Invert Filter"])
19
+ tool.add_value_option(:id_filter,nil,['-I filename','--id-filter filename',"Keep entries with given identifiers"])
20
+
21
+ exit unless tool.check_options(true)
22
+
23
+ input_file=ARGV[0]
24
+
25
+ output_fh = tool.explicit_output!=nil ? File.new("#{tool.explicit_output}",'w') : $stdout
26
+
27
+ $filter_ids = Set.new()
28
+ if tool.id_filter && (File.exists?(tool.id_filter) || tool.id_filter=="-")
29
+ if tool.id_filter=="-"
30
+ $filter_ids = $stdin.read.split("\n").collect { |e| e.chomp }
31
+ else
32
+ $filter_ids = File.readlines(tool.id_filter).collect { |e| e.chomp }
33
+ end
34
+ $filter_ids = Set.new($filter_ids) # Much faster set include than array include
35
+ end
36
+
37
+ def passes_filters(entry,tool)
38
+
39
+ if tool.definition_filter
40
+ if entry.definition =~ /#{tool.definition_filter}/
41
+ return true
42
+ else
43
+ return false
44
+ end
45
+ end
46
+
47
+ if $filter_ids.length > 0
48
+ # require 'byebug';byebug
49
+ if $filter_ids.include? entry.entry_id
50
+ return true
51
+ end
52
+ return false
53
+ end
54
+
55
+ # Always true if there are no filters defined
56
+
57
+ return true
58
+
59
+ end
60
+
61
+
62
+ ARGV.each do |fasta_file|
63
+
64
+ file = Bio::FastaFormat.open(fasta_file.chomp)
65
+ file.each do |entry|
66
+
67
+
68
+ pass = passes_filters(entry,tool)
69
+ pass = !pass if tool.invert
70
+ if pass
71
+ output_fh.write entry
72
+ end
73
+ end
74
+ end
@@ -0,0 +1,109 @@
1
+ #!/usr/bin/env ruby
2
+ #
3
+ # This file is part of protk
4
+ # Created by Ira Cooke 24/6/2015
5
+ #
6
+ # Filters a pepxml file by removing or keeping only psms that match a filter
7
+ #
8
+
9
+ require 'protk/constants'
10
+ require 'protk/command_runner'
11
+ require 'protk/tool'
12
+ require 'bio'
13
+ require 'libxml'
14
+
15
+ include LibXML
16
+
17
+ tool=Tool.new([:explicit_output,:debug])
18
+ tool.option_parser.banner = "Filter psms in a pepxml file.\n\nUsage: filter_psms.rb [options] expression file.pepxml"
19
+ tool.add_value_option(:filter,"protein",['-A','--attribute name',"Match expression against a specific search_hit attribute"])
20
+ tool.add_boolean_option(:check_alternative_proteins,false,['-C','--check-alternatives',"Also match expression against to alternative_proteins"])
21
+ tool.add_boolean_option(:reject_mode,false,['-R','--reject',"Keep mismatches instead of matches"])
22
+
23
+ exit unless tool.check_options(true,[:filter])
24
+
25
+ if ARGV.length!=2
26
+ puts "Wrong number of arguments. You must supply a filter expression and a pepxml file"
27
+ exit(1)
28
+ end
29
+
30
+ expressions=ARGV[0].split(",").map(&:strip)
31
+ input_file=ARGV[1]
32
+
33
+ $protk = Constants.instance
34
+ log_level = tool.debug ? "info" : "warn"
35
+ $protk.info_level= log_level
36
+
37
+
38
+ output_fh = tool.explicit_output!=nil ? File.new("#{tool.explicit_output}",'w') : $stdout
39
+
40
+ throw "Input file #{input_file} does not exist" unless File.exist? "#{input_file}"
41
+
42
+ XML::Error.set_handler(&XML::Error::QUIET_HANDLER)
43
+
44
+ doc = XML::Document.file("#{input_file}")
45
+ reader = XML::Reader.document(doc)
46
+
47
+
48
+ # First print out the header (ie before spectrum_queries)
49
+ File.foreach("#{input_file}") do |line|
50
+ if line =~ /\<spectrum_query/
51
+ break;
52
+ else
53
+ output_fh.write line
54
+ end
55
+ end
56
+
57
+ pepxml_ns_prefix="xmlns:"
58
+ pepxml_ns="xmlns:http://regis-web.systemsbiology.net/pepXML"
59
+
60
+ kept=0
61
+ deleted=0
62
+ scanned=0
63
+
64
+ while reader.read
65
+
66
+ if reader.name == "spectrum_query"
67
+ sq_node = reader.expand
68
+
69
+ hits = sq_node.find("./#{pepxml_ns_prefix}search_result/#{pepxml_ns_prefix}search_hit[@hit_rank=\"1\"]",pepxml_ns)
70
+
71
+ throw "More than one first ranked search hit" if hits.length>1
72
+ throw "No search hit for spectrum_query" if hits.length==0
73
+
74
+ hit = hits[0]
75
+
76
+ has_match = expressions.collect { |expression| (hit.attributes[tool.filter] =~ /#{expression}/) }.any?
77
+
78
+ if !has_match && tool.check_alternative_proteins
79
+ alts = hit.find("./#{pepxml_ns_prefix}alternative_protein",pepxml_ns)
80
+
81
+ # Check alternative proteins
82
+ alt_expr = alts.collect { |alt| expressions.collect { |expression| (alt.attributes[tool.filter] =~ /#{expression}/ )}}
83
+
84
+ has_match = alt_expr.flatten.any?
85
+ end
86
+
87
+ if (has_match && !tool.reject_mode) || (!has_match && tool.reject_mode) #&& (hit.attributes['hit_rank']=="1")
88
+ kept+=1
89
+
90
+ # Remove any lower ranked hits
91
+ #
92
+ secondary_hits = sq_node.find("./#{pepxml_ns_prefix}search_result/#{pepxml_ns_prefix}search_hit[@hit_rank!=\"1\"]",pepxml_ns)
93
+ secondary_hits.each { |sh| sh.remove! }
94
+
95
+ output_fh.write "#{sq_node}\n"
96
+ else
97
+ deleted+=1
98
+ end
99
+
100
+ scanned+=1
101
+
102
+ reader.next_sibling
103
+ end
104
+
105
+ end
106
+
107
+ output_fh.write "</msms_run_summary>\n</msms_pipeline_analysis>\n"
108
+
109
+ $protk.log "Kept #{kept} and deleted #{deleted}" , :info
@@ -41,17 +41,17 @@ search_tool.options.instrument=0
41
41
 
42
42
  # MS-GF+ doesnt support fragment tol so add this manually rather than via the SearchTool defaults
43
43
  search_tool.add_value_option(:precursor_tol,"20",['-p','--precursor-ion-tol tol', 'Precursor ion mass tolerance.'])
44
- search_tool.add_value_option(:precursor_tolu,"ppm",['--precursor-ion-tol-units tolu', 'Precursor ion mass tolerance units (ppm or Da). Default=ppm'])
44
+ search_tool.add_value_option(:precursor_tolu,"ppm",['--precursor-ion-tol-units tolu', 'Precursor ion mass tolerance units (ppm or Da).'])
45
45
 
46
46
  search_tool.add_boolean_option(:pepxml,false,['--pepxml', 'Convert results to pepxml.'])
47
- search_tool.add_value_option(:isotope_error_range,"0,1",['--isotope-error-range range', 'Takes into account of the error introduced by chooosing a non-monoisotopic peak for fragmentation.(Default 0,1)'])
47
+ search_tool.add_value_option(:isotope_error_range,"0,1",['--isotope-error-range range', 'Takes into account of the error introduced by chooosing a non-monoisotopic peak for fragmentation.'])
48
48
  search_tool.add_value_option(:fragment_method,0,['--fragment-method method', 'Fragment method 0: As written in the spectrum or CID if no info (Default), 1: CID, 2: ETD, 3: HCD, 4: Merge spectra from the same precursor'])
49
49
  search_tool.add_boolean_option(:decoy_search,false,['--decoy-search', 'Build and search a decoy database on the fly. Input db should not contain decoys if this option is used'])
50
50
  search_tool.add_value_option(:protocol,0,['--protocol p', '0: NoProtocol (Default), 1: Phosphorylation, 2: iTRAQ, 3: iTRAQPhospho'])
51
- search_tool.add_value_option(:min_pep_length,6,['--min-pep-length p', 'Minimum peptide length to consider, Default: 6'])
52
- search_tool.add_value_option(:max_pep_length,40,['--max-pep-length p', 'Maximum peptide length to consider, Default: 40'])
53
- search_tool.add_value_option(:min_pep_charge,2,['--min-pep-charge c', 'Minimum precursor charge to consider if charges are not specified in the spectrum file, Default: 2'])
54
- search_tool.add_value_option(:max_pep_charge,3,['--max-pep-charge c', 'Maximum precursor charge to consider if charges are not specified in the spectrum file, Default: 3'])
51
+ search_tool.add_value_option(:min_pep_length,6,['--min-pep-length p', 'Minimum peptide length to consider'])
52
+ search_tool.add_value_option(:max_pep_length,40,['--max-pep-length p', 'Maximum peptide length to consider'])
53
+ search_tool.add_value_option(:min_pep_charge,2,['--min-pep-charge c', 'Minimum precursor charge to consider if charges are not specified in the spectrum file'])
54
+ search_tool.add_value_option(:max_pep_charge,3,['--max-pep-charge c', 'Maximum precursor charge to consider if charges are not specified in the spectrum file'])
55
55
  search_tool.add_value_option(:num_reported_matches,1,['--num-reported-matches n', 'Number of matches per spectrum to be reported, Default: 1'])
56
56
  search_tool.add_boolean_option(:add_features,false,['--add-features', 'output additional features'])
57
57
  search_tool.add_value_option(:java_mem,"3500M",['--java-mem mem','Java memory limit when running the search (Default 3.5Gb)'])
@@ -208,23 +208,32 @@ ARGV.each do |filename|
208
208
  cmd << " -mod #{mods_path}"
209
209
  end
210
210
 
211
+
211
212
  # As a final part of the command we convert to pepxml
212
213
  if search_tool.pepxml
213
214
  #if search_tool.explicit_output
214
215
  cmd << ";ruby -pi.bak -e \"gsub('post=\\\"?','post=\\\"X')\" #{mzid_output_path}"
215
216
  cmd << ";ruby -pi.bak -e \"gsub('pre=\\\"?','pre=\\\"X')\" #{mzid_output_path}"
216
- cmd << ";idconvert #{mzid_output_path} --pepXML -o #{Pathname.new(mzid_output_path).dirname}"
217
+ cmd << ";ruby -pi.bak -e \"gsub('id=\\\"UnspecificCleavage\\\"','id=\\\"UnspecificCleavage\\\" name=\\\"unspecific cleavage\\\"')\" #{mzid_output_path}"
218
+
219
+ idconvert_relative_output_dir = (0...10).map { ('a'..'z').to_a[rand(26)] }.join
220
+
221
+ # require 'byebug';byebug
222
+
223
+ idconvert_output_dir = "#{Pathname.new(mzid_output_path).dirname}/#{idconvert_relative_output_dir}"
224
+ cmd << ";idconvert #{mzid_output_path} --pepXML -o #{idconvert_output_dir}"
217
225
 
218
226
 
219
- pepxml_output_path = "#{mzid_output_path.chomp('.mzid')}.pepXML"
227
+ cmd << "; pep_xml_output_path=`ls #{idconvert_output_dir}/*.pepXML`; echo $pep_xml_output_path"
228
+ #"#{mzid_output_path.chomp('.mzid')}.pepXML"
220
229
 
221
230
  # Fix the msms_run_summary base_name attribute
222
231
  #
223
232
  if for_galaxy
224
- cmd << ";ruby -pi.bak -e \"gsub(/ base_name=[^ ]+/,' base_name=\\\"#{original_input_file}\\\"')\" #{pepxml_output_path}"
233
+ cmd << ";ruby -pi.bak -e \"gsub(/ base_name=[^ ]+/,' base_name=\\\"#{original_input_file}\\\"')\" $pep_xml_output_path"
225
234
  end
226
235
  #Then copy the pepxml to the final output path
227
- cmd << "; mv #{pepxml_output_path} #{output_path}"
236
+ cmd << "; mv ${pep_xml_output_path} '#{output_path}'"
228
237
  else
229
238
  cmd << "; mv #{mzid_output_path} #{output_path}"
230
239
  end
@@ -29,7 +29,7 @@ tool.option_parser.banner = "Convert an mzIdentML file to protXML.\n\nUsage: mzi
29
29
  exit unless tool.check_options(true)
30
30
 
31
31
  $protk = Constants.instance
32
- log_level = tool.debug ? "info" : "warn"
32
+ log_level = tool.debug ? "debug" : "info"
33
33
  $protk.info_level= log_level
34
34
 
35
35
  input_file=ARGV[0]
@@ -42,6 +42,7 @@ end
42
42
 
43
43
  prot_xml_writer = ProtXMLWriter.new
44
44
 
45
+ $protk.log "Parsing MzIdentML input file" , :info
45
46
  mzid_doc = MzIdentMLDoc.new(input_file)
46
47
 
47
48
  protein_groups = mzid_doc.protein_groups
@@ -59,11 +60,13 @@ protein_groups.each do |group_node|
59
60
  $stdout.write "Scanned #{i} and read #{n_written} of #{n_prots}\r"
60
61
  end
61
62
 
62
- # require 'byebug';byebug
63
- group_prob = MzIdentMLDoc.get_cvParam(group_node,"MS:1002470").attributes['value'].to_f*0.01
64
63
 
65
- if group_prob > tool.minprob.to_f
66
- group = ProteinGroup.from_mzid(group_node)
64
+ group_prob = mzid_doc.get_cvParam(group_node,"MS:1002470").attributes['value'].to_f*0.01
65
+
66
+ if group_prob >= tool.minprob.to_f
67
+ $stdout.write "\n" if tool.debug
68
+ $protk.log "Writing group with probability #{group_prob}" , :info
69
+ group = ProteinGroup.from_mzid(group_node,mzid_doc,tool.minprob.to_f)
67
70
  prot_xml_writer.append_protein_group(group.as_protxml)
68
71
  n_written+=1
69
72
  end
@@ -44,6 +44,7 @@ prophet_tool.add_boolean_option(:force_fit,false,['--force-fit',"Force fitting o
44
44
  prophet_tool.add_boolean_option(:allow_alt_instruments,false,['--allow-alt-instruments',"Warning instead of exit with error if instrument types between runs is different"])
45
45
  prophet_tool.add_boolean_option(:one_ata_time,false,['-F', '--one-ata-time', 'Create a separate pproph output file for each analysis'])
46
46
  prophet_tool.add_value_option(:decoy_prefix,"decoy",['--decoy-prefix prefix', 'Prefix for decoy sequences'])
47
+ prophet_tool.add_boolean_option(:use_non_parametric_model,false,['--use-non-parametric-model', 'Use Non-parametric model, can only be used with decoy option'])
47
48
  prophet_tool.add_boolean_option(:no_decoys,false,['--no-decoy', 'Don\'t use decoy sequences to pin down the negative distribution'])
48
49
  prophet_tool.add_value_option(:experiment_label,nil,['--experiment-label label','used to commonly label all spectra belonging to one experiment (required by iProphet)'])
49
50
 
@@ -194,7 +195,7 @@ def generate_command(genv,prophet_tool,inputs,output,database,engine,enzyme)
194
195
  if prophet_tool.useicat
195
196
  cmd << " -Oi "
196
197
  else
197
- cmd << " -Of"
198
+ cmd << " -Of "
198
199
  end
199
200
 
200
201
  if prophet_tool.maldi
@@ -209,6 +210,10 @@ def generate_command(genv,prophet_tool,inputs,output,database,engine,enzyme)
209
210
  cmd << " -d#{prophet_tool.decoy_prefix} -Od "
210
211
  end
211
212
 
213
+ if prophet_tool.use_non_parametric_model
214
+ cmd << " -OP "
215
+ end
216
+
212
217
  cmd << " -p#{prophet_tool.probability_threshold}"
213
218
 
214
219
  if ( inputs.class==Array)
@@ -58,6 +58,15 @@ def protein_id_to_protdbid(protein_id)
58
58
  return protein_id
59
59
  end
60
60
 
61
+ def protein_is_included(protein,protein_probability_threshold,ignore_regex)
62
+ pass_probability_thresh = (protein.probability >= protein_probability_threshold)
63
+ pass_regex = true
64
+ if ignore_regex && (protein.protein_name =~ /#{ignore_regex}/)
65
+ pass_regex=false
66
+ end
67
+ return (pass_regex && pass_probability_thresh)
68
+ end
69
+
61
70
  def prepare_fasta(database_path,type)
62
71
  db_filename = nil
63
72
  case
@@ -91,6 +100,7 @@ tool.add_value_option(:peptide_probability_threshold,0.95,['--threshold prob','P
91
100
  tool.add_value_option(:protein_probability_threshold,0.99,['--prot-threshold prob','Protein Probability Threshold (Default 0.99)'])
92
101
  tool.add_value_option(:gff_idregex,nil,['--gff-idregex pre','Regex with capture group for parsing gff ids from protein ids'])
93
102
  tool.add_value_option(:genome_idregex,nil,['--genome-idregex pre','Regex with capture group for parsing genomic ids from protein ids'])
103
+ tool.add_value_option(:ignore_regex,nil,['--ignore-regex pre','Regex to match protein ids that we should ignore completely'])
94
104
 
95
105
  exit unless tool.check_options(true,[:database,:coords_file])
96
106
 
@@ -126,7 +136,7 @@ num_missing_gff_entries = 0
126
136
 
127
137
  proteins.each do |protein|
128
138
 
129
- if protein.probability >= tool.protein_probability_threshold
139
+ if protein_is_included(protein,tool.protein_probability_threshold.to_f,tool.ignore_regex)
130
140
 
131
141
  begin
132
142
  $protk.log "Mapping #{protein.protein_name}", :info
@@ -155,7 +165,7 @@ proteins.each do |protein|
155
165
  peptides = tool.stack_charge_states ? protein.peptides : protein.representative_peptides
156
166
 
157
167
  peptides.each do |peptide|
158
- if peptide.probability >= tool.peptide_probability_threshold
168
+ if peptide.probability >= tool.peptide_probability_threshold.to_f
159
169
  peptide_entries = peptide.to_gff3_records(protein_entry.aaseq,gff_parent_entry,gff_cds_entries)
160
170
  peptide_entries.each do |peptide_entry|
161
171
  output_fh.write peptide_entry.to_s
@@ -17,7 +17,7 @@ class Constants
17
17
 
18
18
  # These are logger attributes with thresholds as indicated
19
19
  # DEBUG < INFO < WARN < ERROR < FATAL < UNKNOWN
20
- #Debug (development mode) or Info (production)
20
+ # Debug (development mode) or Info (production)
21
21
  #
22
22
  @stdout_logger
23
23
 
@@ -7,6 +7,31 @@ class MzIdentMLDoc < Object
7
7
  MZID_NS_PREFIX="mzidentml"
8
8
  MZID_NS='http://psidev.info/psi/pi/mzIdentML/1.1'
9
9
 
10
+ attr :psms_cache
11
+ attr :db_sequence_cache
12
+
13
+ def psms_cache
14
+ if !@psms_cache
15
+ @psms_cache={}
16
+ Constants.instance.log "Generating psm index" , :debug
17
+ self.psms.each do |spectrum_identification_item|
18
+ @psms_cache[spectrum_identification_item.attributes['id']]=spectrum_identification_item
19
+ end
20
+ end
21
+ @psms_cache
22
+ end
23
+
24
+ def dbsequence_cache
25
+ if !@dbsequence_cache
26
+ @dbsequence_cache={}
27
+ Constants.instance.log "Generating DB index" , :debug
28
+ self.dbsequences.each do |db_sequence|
29
+ @dbsequence_cache[db_sequence.attributes['accession']]=db_sequence
30
+ end
31
+ end
32
+ @dbsequence_cache
33
+ end
34
+
10
35
  def initialize(path)
11
36
  parser=XML::Parser.file(path)
12
37
  @document=parser.parse
@@ -25,6 +50,10 @@ class MzIdentMLDoc < Object
25
50
  @document.find("//#{MZID_NS_PREFIX}:SpectrumIdentificationItem","#{MZID_NS_PREFIX}:#{MZID_NS}")
26
51
  end
27
52
 
53
+ def dbsequences
54
+ @document.find("//#{MZID_NS_PREFIX}:DBSequence","#{MZID_NS_PREFIX}:#{MZID_NS}")
55
+ end
56
+
28
57
  def protein_groups
29
58
  @document.find("//#{MZID_NS_PREFIX}:ProteinAmbiguityGroup","#{MZID_NS_PREFIX}:#{MZID_NS}")
30
59
  end
@@ -55,17 +84,22 @@ class MzIdentMLDoc < Object
55
84
  node.find("#{pp}#{MZID_NS_PREFIX}:#{expression}","#{MZID_NS_PREFIX}:#{MZID_NS}")
56
85
  end
57
86
 
87
+ def find(node,expression,root=false)
88
+ MzIdentMLDoc.find(node,expression,root)
89
+ end
90
+
58
91
 
59
- def self.get_cvParam(mzidnode,accession)
92
+ def get_cvParam(mzidnode,accession)
60
93
  self.find(mzidnode,"cvParam[@accession=\'#{accession}\']")[0]
61
94
  end
62
95
 
63
- def self.get_dbsequence(mzidnode,accession)
64
- self.find(mzidnode,"DBSequence[@accession=\'#{accession}\']",true)[0]
96
+ def get_dbsequence(mzidnode,accession)
97
+ self.dbsequence_cache[accession]
98
+ # self.find(mzidnode,"DBSequence[@accession=\'#{accession}\']",true)[0]
65
99
  end
66
100
 
67
101
  # As per PeptideShaker. Assume group probability used for protein if it is group rep otherwise 0
68
- def self.get_protein_probability(protein_node)
102
+ def get_protein_probability(protein_node)
69
103
 
70
104
  #MS:1002403
71
105
  is_group_representative=(self.get_cvParam(protein_node,"MS:1002403")!=nil)
@@ -76,28 +110,38 @@ class MzIdentMLDoc < Object
76
110
  end
77
111
  end
78
112
 
79
- def self.get_proteins_for_group(group_node)
80
- self.find(group_node,"ProteinDetectionHypothesis")
113
+ # Memoized because it gets called for every protein in a group
114
+ def get_proteins_for_group(group_node)
115
+ @proteins_for_group_cache ||= Hash.new do |h,key|
116
+ h[key] = self.find(group_node,"ProteinDetectionHypothesis")
117
+ end
118
+ @proteins_for_group_cache[group_node]
81
119
  end
82
120
 
83
121
  # def self.get_sister_proteins(protein_node)
84
122
  # self.find(protein_node.parent,"ProteinDetectionHypothesis")
85
123
  # end
86
124
 
87
- def self.get_peptides_for_protein(protein_node)
125
+ def get_peptides_for_protein(protein_node)
88
126
  self.find(protein_node,"PeptideHypothesis")
89
127
  end
90
128
 
91
129
  # <PeptideHypothesis peptideEvidence_ref="PepEv_1">
92
130
  # <SpectrumIdentificationItemRef spectrumIdentificationItem_ref="SII_1_1"/>
93
131
  # </PeptideHypothesis>
94
- def self.get_best_psm_for_peptide(peptide_node)
132
+ def get_best_psm_for_peptide(peptide_node)
133
+
134
+
95
135
 
96
136
  best_score=-1
97
137
  best_psm=nil
98
- self.find(peptide_node,"SpectrumIdentificationItemRef").each do |id_ref_node|
138
+ spectrumidrefs = self.find(peptide_node,"SpectrumIdentificationItemRef")
139
+ Constants.instance.log "Searching from among #{spectrumidrefs.length} for best psm" , :debug
140
+
141
+ spectrumidrefs.each do |id_ref_node|
99
142
  id_ref = id_ref_node.attributes['spectrumIdentificationItem_ref']
100
- psm_node = self.find(peptide_node,"SpectrumIdentificationItem[@id=\'#{id_ref}\']",true)[0]
143
+ # psm_node = self.find(peptide_node,"SpectrumIdentificationItem[@id=\'#{id_ref}\']",true)[0]
144
+ psm_node = self.psms_cache[id_ref]
101
145
  score = self.get_cvParam(psm_node,"MS:1002466")['value'].to_f
102
146
  if score>best_score
103
147
  best_psm=psm_node
@@ -107,7 +151,7 @@ class MzIdentMLDoc < Object
107
151
  best_psm
108
152
  end
109
153
 
110
- def self.get_sequence_for_peptide(peptide_node)
154
+ def get_sequence_for_peptide(peptide_node)
111
155
  evidence_ref = peptide_node.attributes['peptideEvidence_ref']
112
156
  pep_ref = peptide_node.find("//#{MZID_NS_PREFIX}:PeptideEvidence[@id=\'#{evidence_ref}\']","#{MZID_NS_PREFIX}:#{MZID_NS}")[0].attributes['peptide_ref']
113
157
  peptide=peptide_node.find("//#{MZID_NS_PREFIX}:Peptide[@id=\'#{pep_ref}\']","#{MZID_NS_PREFIX}:#{MZID_NS}")[0]
@@ -115,13 +159,13 @@ class MzIdentMLDoc < Object
115
159
  peptide.find("./#{MZID_NS_PREFIX}:PeptideSequence","#{MZID_NS_PREFIX}:#{MZID_NS}")[0].content
116
160
  end
117
161
 
118
- def self.get_sequence_for_psm(psm_node)
162
+ def get_sequence_for_psm(psm_node)
119
163
  pep_ref = psm_node.attributes['peptide_ref']
120
164
  peptide=psm_node.find("//#{MZID_NS_PREFIX}:Peptide[@id=\'#{pep_ref}\']","#{MZID_NS_PREFIX}:#{MZID_NS}")[0]
121
165
  peptide.find("./#{MZID_NS_PREFIX}:PeptideSequence","#{MZID_NS_PREFIX}:#{MZID_NS}")[0].content
122
166
  end
123
167
 
124
- def self.get_peptide_evidence_from_psm(psm_node)
168
+ def get_peptide_evidence_from_psm(psm_node)
125
169
  pe_nodes = []
126
170
  self.find(psm_node,"PeptideEvidenceRef").each do |pe_node|
127
171
  ev_id=pe_node.attributes['peptideEvidence_ref']
@@ -45,15 +45,15 @@ class Peptide
45
45
  # <cvParam cvRef="PSI-MS" accession="MS:1001093" name="sequence coverage" value="0.0"/>
46
46
  # </ProteinDetectionHypothesis>
47
47
 
48
- def from_mzid(xmlnode)
48
+ def from_mzid(xmlnode,mzid_doc)
49
49
  pep=new()
50
- pep.sequence=MzIdentMLDoc.get_sequence_for_peptide(xmlnode)
51
- best_psm = MzIdentMLDoc.get_best_psm_for_peptide(xmlnode)
50
+ pep.sequence=mzid_doc.get_sequence_for_peptide(xmlnode)
51
+ best_psm = mzid_doc.get_best_psm_for_peptide(xmlnode)
52
52
  # require 'byebug';byebug
53
- pep.probability = MzIdentMLDoc.get_cvParam(best_psm,"MS:1002466")['value'].to_f
54
- pep.theoretical_neutral_mass = MzIdentMLDoc.get_cvParam(best_psm,"MS:1001117")['value'].to_f
53
+ pep.probability = mzid_doc.get_cvParam(best_psm,"MS:1002466")['value'].to_f
54
+ pep.theoretical_neutral_mass = mzid_doc.get_cvParam(best_psm,"MS:1001117")['value'].to_f
55
55
  pep.charge = best_psm.attributes['chargeState'].to_i
56
- pep.protein_name = MzIdentMLDoc.get_dbsequence(xmlnode.parent,xmlnode.parent.attributes['dBSequence_ref']).attributes['accession']
56
+ pep.protein_name = mzid_doc.get_dbsequence(xmlnode.parent,xmlnode.parent.attributes['dBSequence_ref']).attributes['accession']
57
57
 
58
58
  # pep.charge = MzIdentMLDoc.get_charge_for_psm(best_psm)
59
59
 
@@ -42,7 +42,9 @@ class ProphetTool < SearchTool
42
42
  'cnbr' => 'M',
43
43
  'elastase' => 'E',
44
44
  'lysn' => 'L',
45
- 'nonspecific' => 'N'
45
+ 'nonspecific' => 'N',
46
+ 'no enzyme' => 'N',
47
+ 'unspecific cleavage' => 'N'
46
48
  }
47
49
  codes[enzyme_name]
48
50
 
@@ -84,26 +84,33 @@ class Protein
84
84
  # This is hacked together to work for a specific PeptideShaker output type
85
85
  # Refactor and properly respect cvParams for real conversion
86
86
  #
87
- def from_mzid(xmlnode)
87
+ def from_mzid(xmlnode,mzid_doc)
88
88
 
89
89
  coverage_cvparam=""
90
90
  prot=new()
91
91
  groupnode = xmlnode.parent
92
92
 
93
93
  prot.group_number=groupnode.attributes['id'].split("_").last.to_i+1
94
- prot.protein_name=MzIdentMLDoc.get_dbsequence(xmlnode,xmlnode.attributes['dBSequence_ref']).attributes['accession']
95
- prot.n_indistinguishable_proteins=MzIdentMLDoc.get_proteins_for_group(groupnode).length
96
- prot.group_probability=MzIdentMLDoc.get_cvParam(groupnode,"MS:1002470").attributes['value'].to_f
94
+ prot.protein_name=mzid_doc.get_dbsequence(xmlnode,xmlnode.attributes['dBSequence_ref']).attributes['accession']
97
95
 
98
- coverage_node=MzIdentMLDoc.get_cvParam(xmlnode,"MS:1001093")
96
+ prot.n_indistinguishable_proteins=mzid_doc.get_proteins_for_group(groupnode).length
97
+
98
+
99
+ prot.group_probability=mzid_doc.get_cvParam(groupnode,"MS:1002470").attributes['value'].to_f
100
+
101
+
102
+ coverage_node=mzid_doc.get_cvParam(xmlnode,"MS:1001093")
99
103
 
100
104
  prot.percent_coverage=coverage_node.attributes['value'].to_f if coverage_node
101
- prot.probability = MzIdentMLDoc.get_protein_probability(xmlnode)
105
+ prot.probability = mzid_doc.get_protein_probability(xmlnode)
102
106
  # require 'byebug';byebug
103
107
 
104
- peptide_nodes=MzIdentMLDoc.get_peptides_for_protein(xmlnode)
108
+ peptide_nodes=mzid_doc.get_peptides_for_protein(xmlnode)
109
+
110
+ prot.peptides = peptide_nodes.collect { |e| Peptide.from_mzid(e,mzid_doc) }
111
+
112
+ Constants.instance.log "Generated protein entry with probability #{prot.probability}" , :debug
105
113
 
106
- prot.peptides = peptide_nodes.collect { |e| Peptide.from_mzid(e) }
107
114
  prot
108
115
  end
109
116
 
@@ -35,18 +35,25 @@ class ProteinGroup
35
35
  # This is hacked together to work for a specific PeptideShaker output type
36
36
  # Refactor and properly respect cvParams for real conversion
37
37
  #
38
- def from_mzid(groupnode)
38
+ def from_mzid(groupnode,mzid_doc,minprob=0)
39
39
 
40
40
  group=new()
41
41
 
42
42
  group.group_number=groupnode.attributes['id'].split("_").last.to_i+1
43
- group.group_probability=MzIdentMLDoc.get_cvParam(groupnode,"MS:1002470").attributes['value'].to_f
43
+ group.group_probability=mzid_doc.get_cvParam(groupnode,"MS:1002470").attributes['value'].to_f
44
44
 
45
45
  # require 'byebug';byebug
46
46
 
47
- protein_nodes=MzIdentMLDoc.get_proteins_for_group(groupnode)
47
+ protein_nodes=mzid_doc.get_proteins_for_group(groupnode)
48
+
49
+
50
+
51
+ group_members = protein_nodes.select do |e|
52
+ mzid_doc.get_protein_probability(e)>=minprob
53
+ end
54
+
55
+ group.proteins = group_members.collect { |e| Protein.from_mzid(e,mzid_doc) }
48
56
 
49
- group.proteins = protein_nodes.collect { |e| Protein.from_mzid(e) }
50
57
  group
51
58
  end
52
59
 
@@ -26,7 +26,7 @@ class PeptideEvidence
26
26
  # dBSequence_ref="JEMP01000193.1_rev_g3500.t1" id="PepEv_1" />
27
27
  class << self
28
28
 
29
- def from_mzid(pe_node)
29
+ def from_mzid(pe_node,mzid_doc)
30
30
  pe = new()
31
31
  pe.peptide_prev_aa=pe_node.attributes['pre']
32
32
  pe.peptide_next_aa=pe_node.attributes['post']
@@ -45,7 +45,7 @@ class PeptideEvidence
45
45
  # name="protein description" value="280755|283436" />
46
46
  # </DBSequence>
47
47
  pe.protein=prot_node.attributes['accession']
48
- pe.protein_descr=MzIdentMLDoc.get_cvParam(prot_node,"MS:1001088")['value']
48
+ pe.protein_descr=mzid_doc.get_cvParam(prot_node,"MS:1001088")['value']
49
49
 
50
50
 
51
51
  # pe.peptide_sequence=pep_node
@@ -163,11 +163,11 @@ class PSM
163
163
 
164
164
 
165
165
 
166
- def from_mzid(psm_node)
166
+ def from_mzid(psm_node,mzid_doc)
167
167
  psm = new()
168
- psm.peptide = MzIdentMLDoc.get_sequence_for_psm(psm_node)
169
- peptide_evidence_nodes = MzIdentMLDoc.get_peptide_evidence_from_psm(psm_node)
170
- psm.peptide_evidence = peptide_evidence_nodes.collect { |pe| PeptideEvidence.from_mzid(pe) }
168
+ psm.peptide = mzid_doc.get_sequence_for_psm(psm_node)
169
+ peptide_evidence_nodes = mzid_doc.get_peptide_evidence_from_psm(psm_node)
170
+ psm.peptide_evidence = peptide_evidence_nodes.collect { |pe| PeptideEvidence.from_mzid(pe,mzid_doc) }
171
171
 
172
172
  psm.calculated_mz = psm_node.attributes['calculatedMassToCharge'].to_f
173
173
  psm.experimental_mz = psm_node.attributes['experimentalMassToCharge'].to_f
@@ -34,13 +34,13 @@ class SearchTool < Tool
34
34
  end
35
35
 
36
36
  if ( option_support.include? :mass_tolerance_units )
37
- add_value_option(:fragment_tolu,"Da",['--fragment-ion-tol-units tolu', 'Fragment ion mass tolerance units (Da or mmu). Default=Da'])
38
- add_value_option(:precursor_tolu,"ppm",['--precursor-ion-tol-units tolu', 'Precursor ion mass tolerance units (ppm or Da). Default=ppm'])
37
+ add_value_option(:fragment_tolu,"Da",['--fragment-ion-tol-units tolu', 'Fragment ion mass tolerance units (Da or mmu).'])
38
+ add_value_option(:precursor_tolu,"ppm",['--precursor-ion-tol-units tolu', 'Precursor ion mass tolerance units (ppm or Da).'])
39
39
  end
40
40
 
41
41
  if ( option_support.include? :mass_tolerance )
42
- add_value_option(:fragment_tol,0.65,['-f', '--fragment-ion-tol tol', 'Fragment ion mass tolerance (unit dependent). Default=0.65'])
43
- add_value_option(:precursor_tol,200,['-p','--precursor-ion-tol tol', 'Precursor ion mass tolerance. Default=200'])
42
+ add_value_option(:fragment_tol,0.65,['-f', '--fragment-ion-tol tol', 'Fragment ion mass tolerance (unit dependent).'])
43
+ add_value_option(:precursor_tol,200,['-p','--precursor-ion-tol tol', 'Precursor ion mass tolerance.'])
44
44
  end
45
45
 
46
46
  if ( option_support.include? :precursor_search_type )
@@ -64,7 +64,7 @@ class SearchTool < Tool
64
64
  end
65
65
 
66
66
  if ( option_support.include? :searched_ions )
67
- add_value_option(:searched_ions,"",['--searched-ions si', 'Ion series to search (default=b,y)'])
67
+ add_value_option(:searched_ions,"",['--searched-ions si', 'Ion series to search'])
68
68
  end
69
69
 
70
70
  if ( option_support.include? :multi_isotope_search )
@@ -86,12 +86,12 @@ class SpectrumQuery
86
86
  # unitAccession="UO:0000010" unitName="seconds" />
87
87
  # </SpectrumIdentificationResult>
88
88
 
89
- def from_mzid(query_node)
89
+ def from_mzid(query_node,mzid_doc)
90
90
  query = new()
91
- query.spectrum_title = MzIdentMLDoc.get_cvParam(query_node,"MS:1000796")['value'].to_s
92
- query.retention_time = MzIdentMLDoc.get_cvParam(query_node,"MS:1000894")['value'].to_f
91
+ query.spectrum_title = mzid_doc.get_cvParam(query_node,"MS:1000796")['value'].to_s
92
+ query.retention_time = mzid_doc.get_cvParam(query_node,"MS:1000894")['value'].to_f
93
93
  items = MzIdentMLDoc.find(query_node,"SpectrumIdentificationItem")
94
- query.psms = items.collect { |item| PSM.from_mzid(item) }
94
+ query.psms = items.collect { |item| PSM.from_mzid(item,mzid_doc) }
95
95
  query
96
96
  end
97
97
 
@@ -26,8 +26,8 @@ class Tool
26
26
  # Options set from the command-line
27
27
  #
28
28
  attr :options, false
29
-
30
- # The option parser used to parse command-line options.
29
+
30
+ # The option parser used to parse command-line options.
31
31
  #
32
32
  attr :option_parser
33
33
 
@@ -62,19 +62,27 @@ class Tool
62
62
  super
63
63
  end
64
64
  end
65
-
66
-
67
- def add_value_option(symbol,default_value,opts)
65
+
66
+ def add_default_to_help(default_value,opts)
67
+ if default_value!=nil && default_value!=" " && default_value!=""
68
+ opts[-1] = "#{opts.last} [#{default_value.to_s}]"
69
+ end
70
+ opts
71
+ end
72
+
73
+ def add_value_option(symbol,default_value,opts)
68
74
  @options[symbol]=default_value
75
+ opts=add_default_to_help(default_value,opts)
69
76
  @option_parser.on(*opts) do |val|
70
77
  @options[symbol]=val
71
78
  @options_defined_by_user[symbol]=opts
72
79
  end
73
80
  end
74
-
81
+
75
82
  def add_boolean_option(symbol,default_value,opts)
76
83
  @options[symbol]=default_value
77
- @option_parser.on(*opts) do
84
+ opts=add_default_to_help(default_value,opts)
85
+ @option_parser.on(*opts) do
78
86
  @options[symbol]=!default_value
79
87
  @options_defined_by_user[symbol]=opts
80
88
  end
@@ -92,10 +100,10 @@ class Tool
92
100
  options.encoding = "utf8"
93
101
  options.transfer_type = :auto
94
102
  options.verbose = false
95
-
103
+
96
104
  @options_defined_by_user={}
97
105
 
98
- @option_parser=OptionParser.new do |opts|
106
+ @option_parser=OptionParser.new do |opts|
99
107
 
100
108
  opts.on( '-h', '--help', 'Display this screen' ) do
101
109
  puts opts
@@ -108,7 +116,7 @@ class Tool
108
116
  end
109
117
 
110
118
  if ( option_support.include? :over_write)
111
- add_boolean_option(:over_write,false,['-r', '--replace-output', 'Dont skip analyses for which the output file already exists'])
119
+ add_boolean_option(:over_write,false,['-r', '--replace-output', 'Dont skip analyses for which the output file already exists'])
112
120
  end
113
121
 
114
122
  if ( option_support.include? :explicit_output )
@@ -120,7 +128,7 @@ class Tool
120
128
  end
121
129
 
122
130
  if ( option_support.include? :database)
123
- add_value_option(:database,"sphuman",['-d', '--database dbname', 'Specify the database to use for this search. Can be a named protk database or the path to a fasta file'])
131
+ add_value_option(:database,"sphuman",['-d', '--database dbname', 'Specify the database to use for this search. Can be a named protk database or the path to a fasta file'])
124
132
  end
125
133
 
126
134
  if (option_support.include? :debug)
@@ -169,37 +177,37 @@ class Tool
169
177
  return true
170
178
  end
171
179
  missing = mandatory.select{ |param| self.send(param).nil? }
172
- if not missing.empty?
173
- puts "Missing options: #{missing.join(', ')}"
174
- puts self.option_parser
175
- return false
176
- end
177
- rescue OptionParser::InvalidOption, OptionParser::MissingArgument
178
- puts $!.to_s
179
- puts self.option_parser
180
- return false
180
+ if not missing.empty?
181
+ puts "Missing options: #{missing.join(', ')}"
182
+ puts self.option_parser
183
+ return false
184
+ end
185
+ rescue OptionParser::InvalidOption, OptionParser::MissingArgument
186
+ puts $!.to_s
187
+ puts self.option_parser
188
+ return false
181
189
  end
182
190
 
183
191
  if ( require_input_file && ARGV[0].nil? )
184
192
  puts "You must supply an input file"
185
- puts self.option_parser
193
+ puts self.option_parser
186
194
  return false
187
195
  end
188
196
 
189
197
  return true
190
- end
191
-
198
+ end
199
+
192
200
  # Run the search tool using the given command string and global environment
193
201
  #
194
202
  def run(cmd,genv,autodelete=true)
195
203
  cmd_runner=CommandRunner.new(genv)
196
204
  cmd_runner.run_local(cmd)
197
205
  end
198
-
206
+
199
207
 
200
208
  def database_info
201
209
  case
202
- when Pathname.new(@options.database).exist? # It's an explicitly named db
210
+ when Pathname.new(@options.database).exist? # It's an explicitly named db
203
211
  db_path=Pathname.new(@options.database).expand_path.to_s
204
212
  db_name=Pathname.new(@options.database).basename.to_s
205
213
  else
@@ -211,4 +219,4 @@ class Tool
211
219
 
212
220
 
213
221
 
214
- end
222
+ end
metadata CHANGED
@@ -1,14 +1,14 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: protk
3
3
  version: !ruby/object:Gem::Version
4
- version: 1.4.2
4
+ version: 1.4.3
5
5
  platform: ruby
6
6
  authors:
7
7
  - Ira Cooke
8
8
  autorequire:
9
9
  bindir: bin
10
10
  cert_chain: []
11
- date: 2015-05-20 00:00:00.000000000 Z
11
+ date: 2015-10-21 00:00:00.000000000 Z
12
12
  dependencies:
13
13
  - !ruby/object:Gem::Dependency
14
14
  name: open4
@@ -36,7 +36,7 @@ dependencies:
36
36
  requirements:
37
37
  - - ~>
38
38
  - !ruby/object:Gem::Version
39
- version: '1.4'
39
+ version: 1.4.3
40
40
  - - '>='
41
41
  - !ruby/object:Gem::Version
42
42
  version: 1.4.3
@@ -46,7 +46,7 @@ dependencies:
46
46
  requirements:
47
47
  - - ~>
48
48
  - !ruby/object:Gem::Version
49
- version: '1.4'
49
+ version: 1.4.3
50
50
  - - '>='
51
51
  - !ruby/object:Gem::Version
52
52
  version: 1.4.3
@@ -210,6 +210,7 @@ executables:
210
210
  - mzid_to_pepxml.rb
211
211
  - spectrast_create.rb
212
212
  - spectrast_filter.rb
213
+ - filter_psms.rb
213
214
  extensions:
214
215
  - ext/decoymaker/extconf.rb
215
216
  extra_rdoc_files: []
@@ -217,6 +218,8 @@ files:
217
218
  - README.md
218
219
  - bin/add_retention_times.rb
219
220
  - bin/augustus_to_proteindb.rb
221
+ - bin/filter_fasta.rb
222
+ - bin/filter_psms.rb
220
223
  - bin/interprophet.rb
221
224
  - bin/make_decoy.rb
222
225
  - bin/manage_db.rb