RubyGems - protk - Versions diffs - 1.1.2 → 1.1.4 - Mend

protk 1.1.2 → 1.1.4

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (16) hide show

data/README.md +12 -16
data/bin/feature_finder.rb +38 -31
data/bin/gffmerge.rb +199 -0
data/bin/protk_setup.rb +26 -4
data/bin/sixframe.rb +62 -0
data/bin/toppas_pipeline.rb +73 -0
data/lib/protk/constants.rb +29 -10
data/lib/protk/data/ExecutePipeline.trf +7 -0
data/lib/protk/data/FeatureFinderIsotopeWavelet.ini +26 -0
data/lib/protk/data/brew_packages.yaml +8 -2
data/lib/protk/data/galaxyenv.sh +17 -0
data/lib/protk/fastadb.rb +48 -0
data/lib/protk/galaxy_stager.rb +2 -0
data/lib/protk/openms_defaults.rb +11 -0
data/lib/protk/setup_rakefile.rake +51 -6
metadata +14 -2

data/README.md CHANGED Viewed

@@ -23,6 +23,8 @@ On OSX
     rvm install 1.9.3 --with-gcc=clang
     rvm use 1.9.3
     gem install protk
+    protk_setup.rb package_manager
+    protk_setup.rb system_packages
     protk_setup.rb all
 On Linux
@@ -30,7 +32,7 @@ On Linux
     rvm install 1.9.3
     rvm use 1.9.3
     gem install protk
-    sudo protk_setup.rb system_dependencies
+    sudo protk_setup.rb system_packages
     protk_setup all
@@ -63,25 +65,19 @@ Although all the protk tools can be run directly from the command-line a nicer w
 2. Make the protk tools available to galaxy.
     - Create a directory for galaxy tool dependencies. It's best if this directory is outside the galaxy-dist directory. I usually create a directory called `tool_depends` alongside `galaxy-dist`.
     - Open the file `universe_wsgi.ini` in the `galaxy-dist` directory and set the configuration option `tool_dependency_dir` to point to the directory you just created
-    - Create a symbolic link from the protk directory to the appropriate subdirectory of `<tool_dependency_dir>`. In the instructions below substitute 1.0.0 for the version number of [the protk galaxy tools](https://bitbucket.org/iracooke/protk-toolshed "protk galaxy tools") you are using.
+    - Create a protkgem directory inside `<tool_dependency_dir>`.
             cd <tool_dependency_dir>
-            mkdir protk
-			cd protk
-            mkdir 1.0.0
-            ln -s 1.0.0 default
-            ln -s <path_where_protk_was_installed> 1.0.0/bin
+            mkdir protkgem
+			cd protkgem
+            mkdir rvm193
+            ln -s rvm193 default
+            cd default
+            ln -s ~/.protk/galaxy/env.sh env.sh
-3. Configure the shell in which galaxy tools will run.
-    - Create a symlink to the `env.sh` file so it will be sourced by galaxy as it runs each tool. This file should have been autogenerated by `setup.sh`
+3. Install any of the Proteomics tools that depend on protk from the galaxy toolshed
-            ln -s <path_where_protk_was_installed>/env.sh 1.0.0/env.sh
-4. Install the protk galaxy wrapper tools from the galaxy toolshed. You will need to restart galaxy after doing so for the new datatype sniffers to be activated.
-5. After installing the protk wrapper tools from the toolshed it will be necessary to tell those tools about databases you have installed. Use the manage_db.rb tool to do this. To do this, first edit config.yml to make sure the `galaxy_root` setting points to the root directory of your galaxy installation (this will allow `manage_db.rb` to update the `pepxml_databases.loc` file inside `galaxy_root/tool-data`). The run the following command and then restart the galaxy server;
-		manage_db.rb list -G
+4. After installing the protk wrapper tools from the toolshed it will be necessary to tell those tools about databases you have installed. Use the manage_db.rb tool to do this.

data/bin/feature_finder.rb CHANGED Viewed

@@ -1,37 +1,32 @@
+#!/usr/bin/env ruby
 #
 # This file is part of protk
 # Created by Ira Cooke 21/3/2012
 #
 # A wrapper for the OpenMS FeatureFinder tools (FeatureFinderCentroided and FeatureFinderIsotopeWavelet)
-#
-#
-#!/bin/sh
-if [ -z "$PROTK_RUBY_PATH" ] ; then
-  PROTK_RUBY_PATH=`which ruby`
-fi
-eval 'exec "$PROTK_RUBY_PATH" $PROTK_RUBY_FLAGS -rubygems -x -S $0 ${1+"$@"}'
-echo "The 'exec \"$PROTK_RUBY_PATH\" -x -S ...' failed!" >&2
-exit 1
-#! ruby
-#
-$LOAD_PATH.unshift("#{File.dirname(__FILE__)}/lib/")
+require 'protk/constants'
+require 'protk/command_runner'
+require 'protk/tool'
+require 'protk/openms_defaults'
+require 'libxml'
-require 'constants'
-require 'command_runner'
-require 'tool'
+include LibXML
-# Setup specific command-line options for this tool. Other options are inherited from ProphetTool
-#
-tool=Tool.new({:explicit_output=>true, :background=>true,:over_write=>true})
+tool=Tool.new({:explicit_output=>true, :background=>true,:over_write=>true,:prefix_suffix=>true})
 tool.option_parser.banner = "Find molecular features on a set of input files.\n\nUsage: feature_finder.rb [options] file1.mzML file2.mzML ..."
-tool.options.profile = false
-tool.option_parser.on( '--profile',"Input files are profile data" ) do
-  tool.options.profile = true
+tool.options.intensity_type = "ref"
+tool.option_parser.on( '--intensity-type type',"method used to calculate intensities (ref,trans,corrected). Default = ref. See OpenMS documentation for details" ) do |type|
+  tool.options.intensity_type = type
 end
+tool.options.intensity_threshold = "3"
+tool.option_parser.on( '--intensity-threshold thresh',"discard features below this intensity (Default=3). Set to -1 to retain all detected features" ) do |thresh|
+  tool.options.intensity_threshold = thresh
+end
 tool.option_parser.parse!
 # Obtain a global environment object
@@ -42,32 +37,44 @@ def run_ff(genv,tool,cmd,output_path,jobid)
     genv.log("Skipping analysis on existing file #{output_path}",:warn)
   else
     jobscript_path="#{output_path}.pbs.sh"
-    job_params={:jobid=>jobid, :vmem=>"12Gb", :queue => "sixteen"}
+    job_params={:jobid=>jobid, :vmem=>"14Gb", :queue => "sixteen"}
     code=tool.run(cmd,genv,job_params,jobscript_path)
     throw "Command failed with exit code #{code}" unless code==0
   end
 end
+def generate_ini(tool,out_path)
+  base_ini_file=OpenMSDefaults.new.featurefinderisotopewavelet
+  parser = XML::Parser.file(base_ini_file)
+  doc = parser.parse
+  intensity_threshold_node = doc.find('//ITEM[@name="intensity_threshold"]')[0]
+  intensity_type_node = doc.find('//ITEM[@name="intensity_type"]')[0]
+  intensity_threshold_node['value']=tool.intensity_threshold
+  intensity_type_node['value']=tool.intensity_type
+  doc.save(out_path)
+end
 throw "Cannot use explicit output in combination with multiple input files" if ( tool.explicit_output && ARGV.length>1)
-throw "The profile option is not yet implemented" if ( tool.profile )
-ini_file="#{File.dirname(__FILE__)}/params/FeatureFinderCentroided.ini"
+ini_file="#{Pathname.new(ARGV[0]).dirname.realpath.to_s}/feature_finder.ini"
+generate_ini(tool,ini_file)
 ARGV.each do |filen|
   input_file=filen.chomp
   throw "Input must be an mzML file" unless input_file=~/\.mzML$/
   input_basename=input_file.gsub(/\.mzML$/,'')
-  output_filename=tool.explicit_output
-  output_file="#{input_basename}.featureXML" if output_filename==nil
+  output_dir=Pathname.new(input_basename).dirname.realpath.to_s
+  output_base=Pathname.new(input_basename).basename.to_s
+  output_file = "#{output_dir}/#{tool.output_prefix}#{output_base}#{tool.output_suffix}.featureXML"
   if ( tool.over_write || !Pathname.new(output_file).exist? )
-    output_dir=Pathname.new(output_file).dirname.realpath.to_s
     output_base_filename=Pathname.new(output_file).basename.to_s
     cmd=""
-    cmd<<"#{genv.openms_root}/FeatureFinderCentroided -in #{Pathname.new(input_file).realpath.to_s} -out #{output_dir}/#{output_base_filename} -ini #{ini_file}"
+    cmd<<"export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$HOME/.protk/tools/openms/lib;
+#{genv.featurefinderisotopewavelet} -in #{Pathname.new(input_file).realpath.to_s} -out #{output_dir}/#{output_base_filename} -ini #{ini_file}"
     run_ff(genv,tool,cmd,output_file,tool.jobid_from_filename(input_basename))
   else

data/bin/gffmerge.rb ADDED Viewed

@@ -0,0 +1,199 @@
+#!/usr/bin/env ruby
+#
+# This file is part of protk
+# Original python version created by Max Grant
+# Translated to ruby by Ira Cooke 29/1/2013
+#
+#
+require 'protk/constants'
+require 'protk/tool'
+require 'protk/fastadb'
+require 'libxml'
+require 'bio'
+include LibXML
+tool=Tool.new(:explicit_output=>true)
+tool.option_parser.banner = "Create a gff containing peptide observations.\n\nUsage: gffmerge.rb "
+tool.options.gff_predicted=nil
+tool.option_parser.on( '-g filename','--gff filename', 'Predicted Data (GFF3 Format)' ) do |file|
+  tool.options.gff_predicted=file
+end
+tool.options.protxml=nil
+tool.option_parser.on( '-p filename','--protxml filename', 'Observed Data (ProtXML Format)' ) do |file|
+  tool.options.protxml=file
+end
+tool.options.sixframe=nil
+tool.option_parser.on( '-t filename','--sixframe filename', 'Sixframe Translations (Fasta Format)' ) do |file|
+  tool.options.sixframe=file
+end
+tool.options.skip_fasta_indexing=false
+tool.option_parser.on('--skip-index','Don\'t index sixframe translations (Index should already exist)') do
+  tool.options.skip_fasta_indexing=true
+end
+tool.options.peptide_probability_threshold=0.95
+tool.option_parser.on('--threshold prob','Peptide Probability Threshold (Default 0.95)') do |thresh|
+  tool.options.peptide_probability_threshold=thresh.to_f
+end
+# Checking for required options
+begin
+  tool.option_parser.parse!
+  mandatory = [:protxml,:sixframe]
+  missing = mandatory.select{ |param| tool.send(param).nil? }
+  if not missing.empty?
+    puts "Missing options: #{missing.join(', ')}"
+    puts tool.option_parser
+    exit
+  end
+rescue OptionParser::InvalidOption, OptionParser::MissingArgument
+  puts $!.to_s
+  puts tool.option_parser
+  exit
+end
+gff_out_file="merged.gff"
+if ( tool.explicit_output != nil)
+  gff_out_file=tool.explicit_output
+end
+gff_db = Bio::GFF.new()
+if ( tool.gff_predicted !=nil)
+  p "Reading source gff file"
+  gff_db = Bio::GFF::GFF3.new(File.open(tool.gff_predicted))
+  # p gff_db.records[1].attributes
+  # exit
+end
+f = open(gff_out_file,'w+')
+gff_db.records.each { |rec|
+  f.write(rec.to_s)
+}
+p "Parsing proteins from protxml"
+protxml_parser=XML::Parser.file(tool.protxml)
+protxml_doc=protxml_parser.parse
+proteins = protxml_doc.find('.//protxml:protein','protxml:http://regis-web.systemsbiology.net/protXML')
+p "Indexing sixframe translations"
+db_filename = Pathname.new(tool.sixframe).realpath.to_s
+if tool.skip_fasta_indexing
+  orf_lookup = FastaDB.new(db_filename)
+else
+  orf_lookup = FastaDB.create(db_filename,db_filename,'prot')
+end
+p "Aligning peptides and writing GFF data..."
+low_prob = 0
+skipped = 0
+peptide_count = 0
+protein_count = 0
+total_peptides = 0
+for prot in proteins
+  prot_prob = prot['probability']
+  indis_proteins = prot.find('protxml:indistinguishable_protein','protxml:http://regis-web.systemsbiology.net/protXML')
+  prot_names = [prot['protein_name']]
+  for protein in indis_proteins
+    prot_names += [protein['protein_name']]
+  end
+  peptides = prot.find('protxml:peptide','protxml:http://regis-web.systemsbiology.net/protXML')
+  for protein_name in prot_names
+    protein_count += 1
+    prot_qualifiers = {"source" => "OBSERVATION", "score" => prot_prob, "ID" => 'pr' + protein_count.to_s}
+    begin
+      p "Looking up #{protein_name}"
+      orf = orf_lookup.get_by_id protein_name
+      if ( orf == nil)
+        raise KeyError
+      end
+      position = orf.identifiers.description.split('|').collect { |pos| pos.to_i }
+      if ( position.length != 2 )
+        raise EncodingError
+      end
+      orf_name = orf.entry_id.scan(/lcl\|(.*)/)[0][0]
+      frame=orf_name.scan(/frame_(\d)/)[0][0]
+      scaffold_name = orf_name.scan(/(scaffold_\d+)/)[0][0]
+      # strand = frame > 3 ? -1 : 1
+      strand = +1
+      prot_id = "pr#{protein_count.to_s}"
+      prot_attributes = [["ID",prot_id]]
+      prot_gff_line = Bio::GFF::GFF3::Record.new(seqid = scaffold_name,source="OBSERVATION",feature_type="protein",
+        start_position=position[0],end_position=position[1],score=prot_prob,strand=strand,frame=frame,attributes=prot_attributes)
+      gff_db.records += [prot_gff_line]
+      prot_seq = orf.aaseq.to_s
+      throw "Not amino_acids" if prot_seq != orf.seq.to_s
+      for peptide in peptides
+        pprob = peptide['nsp_adjusted_probability'].to_f
+        if ( pprob >= tool.peptide_probability_threshold )
+          total_peptides += 1
+          pep_seq = peptide['peptide_sequence']
+          start_indexes = [0]
+          prot_seq.scan /#{pep_seq}/  do |match|
+              start_indexes << prot_seq.index(match,start_indexes.last)
+          end
+          start_indexes.delete_at(0)
+          # Now convert peptide coordinate to genome coordinates
+          # And create gff lines for each match
+          start_indexes.collect do |si|
+            pep_genomic_start = position[0] + 3*si
+            pep_genomic_end = pep_genomic_start + 3*pep_seq.length
+            peptide_count+=1
+            pep_attributes = [["ID","p#{peptide_count.to_s}"],["Parent",prot_id]]
+            pep_gff_line = Bio::GFF::GFF3::Record.new(seqid = scaffold_name,source="OBSERVATION",
+              feature_type="peptide",start_position=pep_genomic_start,end_position=pep_genomic_end,score=pprob,
+              strand=strand,frame=frame,attributes=pep_attributes)
+            gff_db.records += [pep_gff_line]
+            # p pep_gff_line
+          end
+        end
+      end
+    rescue KeyError,EncodingError
+      skipped+=0
+      p "Lookup failed for #{protein_name}"
+    end
+    # p orf_name
+    # p prot_gff_line
+    # exit
+  end
+end
+f = open(gff_out_file,'w+')
+gff_db.records.each { |rec|
+  f.write(rec.to_s)
+}
+f.close
+p "Finished."
+p "Proteins: #{protein_count}"
+p "Skipped Decoys: #{skipped}"
+p "Total Peptides: #{total_peptides}"
+p "Peptides Written: #{total_peptides - low_prob}"
+p "Peptides Culled: #{low_prob}"
+exit(0)

data/bin/protk_setup.rb CHANGED Viewed

@@ -16,16 +16,38 @@ require 'pp'
 # Setup specific command-line options for this tool. Other options are inherited from Tool
 #
 tool=SetupTool.new
-if ( tool.option_parser.banner=="")
-  tool.option_parser.banner = "Post install tasks for protk.\nUsage: protk_setup.rb [options] toolname"
-end
+tool.option_parser.banner = "Post install tasks for protk.\nUsage: protk_setup.rb [options] toolname"
 tool.option_parser.parse!
+if ( ARGV.length < 1)
+	p "You must supply a setup task [all,system_packages]"
+	p tool.option_parser
+	exit
+end
+# Checking for required options
+# begin
+#   tool.option_parser.parse!
+#   mandatory = [:gff_predicted, :protxml,:sixframe]
+#   missing = mandatory.select{ |param| tool.send(param).nil? }
+#   if not missing.empty?
+#     puts "Missing options: #{missing.join(', ')}"
+#     puts tool.option_parser
+#     exit
+#   end
+# rescue OptionParser::InvalidOption, OptionParser::MissingArgument
+#   puts $!.to_s
+#   puts tool.option_parser
+#   exit
+# end
 # Create install directory if it doesn't already exist
 #
 env=Constants.new
 ARGV.each do |toolname|
-  tool.install toolname
+	p toolname
+	tool.install toolname
 end

data/bin/sixframe.rb ADDED Viewed

@@ -0,0 +1,62 @@
+#!/usr/bin/env ruby
+#
+# This file is part of protk
+# Original python version created by Max Grant
+# Translated to ruby by Ira Cooke 7/2/2013
+#
+#
+require 'protk/constants'
+require 'protk/tool'
+require 'bio'
+tool=Tool.new(:explicit_output=>true)
+tool.option_parser.banner = "Create a sixframe translation of a genome.\n\nUsage: sixframe.rb [options] genome.fasta"
+tool.option_parser.parse!
+inname=ARGV.shift
+outfile=File.open("#{inname}.translated.fasta",'w')
+if ( tool.explicit_output != nil)
+  outfile=File.open(tool.explicit_output,'w')
+end
+file = Bio::FastaFormat.open(inname)
+file.each do |entry|
+  length = entry.naseq.length
+  (1...7).each do |frame|
+    translated_seq= entry.naseq.translate(frame)
+    orfs=translated_seq.split("*")
+    orf_index = 0
+    position = ((frame - 1) % 3) + 1
+    oi=0
+    orfs.each do |orf|
+      oi+=1
+      if ( orf.length > 20 )
+        position_start = position
+        position_end = position_start + orf.length*3 -1
+        if ( frame > 3)
+            position_start = length - (position - 1)
+            position_end = position_start - orf.length * 3 + 1
+        end
+        # Create accession compliant with NCBI naming standard
+        # See http://www.ncbi.nlm.nih.gov/books/NBK7183/?rendertype=table&id=ch_demo.T5
+        ncbi_scaffold_id = entry.entry_id.gsub('|','_').gsub(' ','_')
+        ncbi_accession = "lcl|#{ncbi_scaffold_id}_frame_#{frame}_orf_#{oi}"
+        # Output in fasta format
+        outfile.write(">#{ncbi_accession} #{position_start}|#{position_end}\n#{orf}\n")
+      end
+      position += orf.length*3+3
+    end
+  end
+end

data/bin/toppas_pipeline.rb ADDED Viewed

@@ -0,0 +1,73 @@
+#!/usr/bin/env ruby
+#
+# This file is part of protk
+# Created by Ira Cooke 30/01/13
+#
+# A wrapper for the OpenMS tool ExecutePipeline.
+# Executes simple toppas pipelines, automatically creating the trf file.
+require 'protk/constants'
+require 'protk/command_runner'
+require 'protk/tool'
+require 'protk/openms_defaults'
+require 'tempfile'
+require 'libxml'
+include LibXML
+tool=Tool.new({:explicit_output=>false, :background=>true,:over_write=>false})
+tool.option_parser.banner = "Execute a toppas pipeline with a single inputs node\n\nUsage: toppas_pipeline.rb [options] input1 input2 ..."
+tool.options.outdir = ""
+tool.option_parser.on( '--outdir dir',"save outputs to dir" ) do |dir|
+  tool.options.outdir = dir
+end
+tool.options.toppas_file = ""
+tool.option_parser.on( '--toppas-file f',"the toppas file to run" ) do |file|
+  tool.options.toppas_file = file
+end
+tool.option_parser.parse!
+# Obtain a global environment object
+genv=Constants.new
+def run_pipeline(genv,tool,cmd,output_path,jobid)
+  jobscript_path="#{output_path}.pbs.sh"
+  job_params={:jobid=>jobid, :vmem=>"14Gb", :queue => "sixteen"}
+  code=tool.run(cmd,genv,job_params,jobscript_path)
+  throw "Command failed with exit code #{code}" unless code==0
+end
+def generate_trf(input_files,out_path)
+  p OpenMSDefaults.new.trf_path
+  parser=XML::Parser.file(OpenMSDefaults.new.trf_path)
+  doc=parser.parse
+  itemlist_node=doc.find('/PARAMETERS/NODE/ITEMLIST')[0]
+  input_files.each do |f|
+    mnode=XML::Node.new('LISTITEM')
+    mnode["value"]="file://#{Pathname.new(f).realpath.to_s}"
+    itemlist_node << mnode
+  end
+  p out_path
+  doc.save(out_path)
+end
+throw "outdir is a required parameter" if tool.outdir==""
+throw "toppas-file is a required parameter" if tool.toppas_file==""
+throw "outdir must exist" unless Dir.exist?(tool.outdir)
+trf_path = "#{tool.toppas_file}.trf"
+generate_trf(ARGV,trf_path)
+cmd=""
+cmd<<"export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$HOME/.protk/tools/openms/lib;
+#{genv.executepipeline} -in #{Pathname.new(tool.toppas_file).realpath.to_s} -out_dir #{Pathname.new(tool.outdir).realpath.to_s} -resource_file #{Pathname.new(trf_path).realpath.to_s}"
+run_pipeline(genv,tool,cmd,tool.outdir,tool.jobid_from_filename(tool.toppas_file))

data/lib/protk/constants.rb CHANGED Viewed

@@ -121,15 +121,6 @@ class Constants
   def omssa2pepxml
     return "#{self.omssa_root}/omssa2pepXML"
   end
-  def openms_root
-    path=@env['openms_root']
-    if ( path =~ /^\// )
-      return path
-    else
-      return "#{@protk_dir}/#{@env['openms_root']}"
-    end
-  end
   def msgfplus_root
     path=@env['msgfplus_root']
@@ -161,6 +152,22 @@ class Constants
     return "#{self.pwiz_root}/msconvert"
   end
+  def openms_root
+    path=@env['openms_root']
+    if ( path =~ /^\//)
+      return path
+    else
+      return "#{@protk_dir}/#{@env['openms_root']}"
+    end
+  end
+  def featurefinderisotopewavelet
+    return "#{self.openms_root}/bin/FeatureFinderIsotopeWavelet"
+  end
+  def executepipeline
+    return "#{self.openms_root}/bin/ExecutePipeline"
+  end
   def protein_database_root
     path=@env['protein_database_root']
@@ -187,6 +194,10 @@ class Constants
   def makeblastdb
     return "#{self.blast_root}/bin/makeblastdb"
   end
+  def searchblastdb
+    return "#{self.blast_root}/bin/blastdbcmd"
+  end
   def log_file
     path=@env['log_file']
@@ -209,10 +220,18 @@ class Constants
     default_config_yml = YAML.load_file "#{File.dirname(__FILE__)}/data/default_config.yml"
     throw "Unable to read the config file at #{File.dirname(__FILE__)}/data/default_config.yml" unless default_config_yml!=nil
-    @env=default_config_yml
+    user_config_yml = nil
+    user_config_yml = YAML.load_file "#{@protk_dir}/config.yml" if File.exist? "#{@protk_dir}/config.yml"
+    if ( user_config_yml !=nil )
+      @env = default_config_yml.merge user_config_yml
+    else
+      @env=default_config_yml
+    end
     throw "No data found in config file" unless @env!=nil
     @info_level=default_config_yml['message_level']
   end

data/lib/protk/data/ExecutePipeline.trf ADDED Viewed

@@ -0,0 +1,7 @@
+<?xml version="1.0" encoding="ISO-8859-1"?>
+<PARAMETERS version="1.3" xsi:noNamespaceSchemaLocation="http://open-ms.sourceforge.net/schemas/Param_1_3.xsd" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
+  <NODE name="1" description="">
+    <ITEMLIST name="url_list" type="string" description="">
+    </ITEMLIST>
+  </NODE>
+</PARAMETERS>

data/lib/protk/data/FeatureFinderIsotopeWavelet.ini ADDED Viewed

@@ -0,0 +1,26 @@
+<?xml version="1.0" encoding="ISO-8859-1"?>
+<PARAMETERS version="1.3" xsi:noNamespaceSchemaLocation="http://open-ms.sourceforge.net/schemas/Param_1_3.xsd" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
+  <NODE name="FeatureFinderIsotopeWavelet" description="Detects two-dimensional features in LC-MS data.">
+    <ITEM name="version" value="1.9.0" type="string" description="Version of the tool that generated this parameters file." tags="advanced" />
+    <NODE name="1" description="Instance &apos;1&apos; section for &apos;FeatureFinderIsotopeWavelet&apos;">
+      <ITEM name="in" value="" type="string" description="input file" tags="input file,required" restrictions="*.mzML" />
+      <ITEM name="out" value="" type="string" description="output file" tags="output file,required" restrictions="*.featureXML" />
+      <ITEM name="log" value="" type="string" description="Name of log file (created only when specified)" tags="advanced" />
+      <ITEM name="debug" value="0" type="int" description="Sets the debug level" tags="advanced" />
+      <ITEM name="threads" value="1" type="int" description="Sets the number of threads allowed to be used by the TOPP tool" />
+      <ITEM name="no_progress" value="false" type="string" description="Disables progress logging to command line" tags="advanced" restrictions="true,false" />
+      <ITEM name="test" value="false" type="string" description="Enables the test mode (needed for internal use only)" tags="advanced" restrictions="true,false" />
+      <NODE name="algorithm" description="Algorithm section">
+        <ITEM name="max_charge" value="3" type="int" description="The maximal charge state to be considered." restrictions="1:" />
+        <ITEM name="intensity_threshold" value="3" type="float" description="The final threshold t&apos; is build upon the formula: t&apos; = av+t*sd, where t is the intensity_threshold, av the average intensity within the wavelet transformed signal and sd the standard deviation of the transform. If you set intensity_threshold=-1, t&apos; will be zero.#br#As the &apos;optimal&apos; value for this parameter is highly data dependent, we would recommend to start with -1, which will also extract features with very low signal-to-noise ratio. Subsequently, one might increase the threshold to find an optimized trade-off between false positives and true positives. Depending on the dynamic range of your spectra, suitable value ranges include: -1, [0:10], and if your data features even very high intensity values, t can also adopt values up to around 30. Please note that this parameter is not of an integer type, s.t. you can also use t:=0.1, e.g." />
+        <ITEM name="intensity_type" value="ref" type="string" description="Determines the intensity type returned for the identified features. &apos;ref&apos; (default) returns the sum of the intensities of each isotopic peak within an isotope pattern. &apos;trans&apos; refers to the intensity of the monoisotopic peak within the wavelet transform. &apos;corrected&apos; refers also to the transformed intensity with an attempt to remove the effects of the convolution. While the latter ones might be preferable for qualitative analyses, &apos;ref&apos; might be the best option to obtain quantitative results. Please note that intensity values might be spoiled (in particular for the option &apos;ref&apos;), as soon as patterns overlap (see also the explanations given in the class documentation of FeatureFinderAlgorihtmIsotopeWavelet)." tags="advanced" restrictions="ref,trans,corrected" />
+        <ITEM name="check_ppm" value="true" type="string" description="Enables/disables a ppm test vs. the averagine model, i.e. potential peptide masses are checked for plausibility. In addition, a heuristic correcting potential mass shifts induced by the wavelet is applied." tags="advanced" restrictions="true,false" />
+        <ITEM name="hr_data" value="false" type="string" description="Must be true in case of high-resolution data, i.e. for spectra featuring large m/z-gaps (present in FTICR and Orbitrap data, e.g.). Please check a single MS scan out of your recording, if you are unsure." restrictions="true,false" />
+        <NODE name="sweep_line" description="">
+          <ITEM name="rt_votes_cutoff" value="5" type="int" description="Defines the minimum number of subsequent scans where a pattern must occur to be considered as a feature." tags="advanced" restrictions="0:" />
+          <ITEM name="rt_interleave" value="1" type="int" description="Defines the maximum number of scans (w.r.t. rt_votes_cutoff) where an expected pattern is missing. There is usually no reason to change the default value." tags="advanced" restrictions="0:" />
+        </NODE>
+      </NODE>
+    </NODE>
+  </NODE>
+</PARAMETERS>

data/lib/protk/data/brew_packages.yaml CHANGED Viewed

@@ -4,7 +4,13 @@
 common:
 - wget
+- cpanm
+- libxml2
 - gd
 - libpng12
-- cpanm
-- libxml2
+openms:
+- autoconf
+- automake
+- libtool
+- cmake

data/lib/protk/data/galaxyenv.sh ADDED Viewed

@@ -0,0 +1,17 @@
+temp_file=`mktemp /tmp/protkXXX`
+export temp_file
+bash << %%%
+[[ -s "$HOME/.rvm/scripts/rvm" ]] && source "$HOME/.rvm/scripts/rvm"
+rvm 1.9.3
+export | grep 'declare -x' | sed 's/declare -x/export/g' > $temp_file
+%%%
+. $temp_file
+rm $temp_file

data/lib/protk/fastadb.rb ADDED Viewed

@@ -0,0 +1,48 @@
+require 'protk/constants'
+require 'bio'
+#
+# Warning: Uses Bio::Command which is a private API of the Bio package
+#
+class FastaDB
+  def initialize(blast_database_file_path)
+    env = Constants.new
+    @database = blast_database_file_path
+    @makedbcmd = env.makeblastdb
+    @searchdbcmd = env.searchblastdb
+  end
+  def self.create(blast_database_file_path,input_fasta_filepath,type='nucl')
+    db = FastaDB.new(blast_database_file_path)
+    db.make_index(input_fasta_filepath,type)
+    db
+  end
+  def get_by_id(entry_id)
+    fetch(entry_id).shift
+  end
+  def make_index(input_fasta,dbtype)
+    cmd = [ @makedbcmd, '-in', input_fasta, '-parse_seqids','-out',@database,'-dbtype',dbtype]
+    res = Bio::Command.call_command(cmd) do |io|
+      puts io.read
+    end
+  end
+  def fetch(list)
+    if list.respond_to?(:join)
+      entry_id = list.join(",")
+    else
+      entry_id = list
+    end
+    cmd = [ @searchdbcmd, '-db', @database, '-entry', entry_id ]
+    Bio::Command.call_command(cmd) do |io|
+      io.close_write
+      Bio::FlatFile.new(Bio::FastaFormat, io).to_a
+    end
+  end
+end

data/lib/protk/galaxy_stager.rb CHANGED Viewed

@@ -1,3 +1,5 @@
+$VERBOSE=nil
 require 'pathname'
 class GalaxyStager

data/lib/protk/openms_defaults.rb ADDED Viewed

@@ -0,0 +1,11 @@
+require 'libxml'
+include LibXML
+class OpenMSDefaults
+	attr :featurefinderisotopewavelet
+	attr :trf_path
+	def initialize
+		@featurefinderisotopewavelet="#{File.dirname(__FILE__)}/data/FeatureFinderIsotopeWavelet.ini"
+		@trf_path = "#{File.dirname(__FILE__)}/data/ExecutePipeline.trf"
+	end
+end

data/lib/protk/setup_rakefile.rake CHANGED Viewed

@@ -22,7 +22,7 @@ end
 def supports_package_manager name
 	res = %x[which #{name}]
-	(res == "")
+	(res != "")
 end
 def clean_build_dir
@@ -53,9 +53,10 @@ task :package_manager do
 	end
 	if needs_homebrew
-		sh { "ruby -e \"$(curl -fsSkL raw.github.com/mxcl/homebrew/go)" }
-		sh { "brew update"}
-		sh { "brew tap homebrew/versions"}
+		puts "Installing Homebrew"
+		sh %{ ruby -e \"$(curl -fsSkL raw.github.com/mxcl/homebrew/go)\" }
+		sh %{ brew update}
+		sh %{ brew tap homebrew/versions}
 	end
 end
@@ -265,7 +266,7 @@ end
 def platform_bunzip
 	if RbConfig::CONFIG['host_os'] =~ /darwin/
-		return 'pbunzip2'
+		return 'bunzip2'
 	end
 	'bunzip2'
 end
@@ -287,6 +288,50 @@ end
 task :pwiz => pwiz_installed_file
-task :all => [:tpp,:omssa,:blast,:msgfplus,:pwiz]
+#
+# openms
+#
+def platform_cmake_args
+	if RbConfig::CONFIG['host_os'] =~ /darwin/
+		return '-D CMAKE_CXX_COMPILER=/usr/bin/g++ -D CMAKE_C_COMPILER=/usr/bin/gcc '
+	end
+	''
+end
+openms_version="1.9.0"
+openms_packagefile="OpenMS-#{openms_version}.tar.gz"
+openms_url="https://dl.dropbox.com/u/226794/#{openms_packagefile}"
+openms_installed_file="#{env.featurefinderisotopewavelet}"
+download_task openms_url, openms_packagefile
+file openms_installed_file => [@build_dir,"#{@download_dir}/#{openms_packagefile}"] do
+	sh %{cp #{@download_dir}/#{openms_packagefile} #{@build_dir}}
+    sh %{cd #{@build_dir}; gunzip -f #{openms_packagefile}}
+    sh %{cd #{@build_dir}; tar -xvf #{openms_packagefile.chomp('.gz')}}
+    sh %{mkdir -p #{env.openms_root}}
+    sh %{cd #{@build_dir}/OpenMS-#{openms_version}/contrib; cmake #{platform_cmake_args} .}
+    sh %{cd #{@build_dir}/OpenMS-#{openms_version}; cmake -D INSTALL_PREFIX=#{env.openms_root} .}
+    sh %{cd #{@build_dir}/OpenMS-#{openms_version}; make install}
+end
+task :openms => openms_installed_file
+#
+# Galaxy Environment
+#
+protk_galaxy_envfile = "#{env.protk_dir}/galaxy/env.sh"
+file protk_galaxy_envfile do
+	sh %{mkdir -p #{env.protk_dir}/galaxy}
+	this_dir=File.dirname(__FILE__)
+	sh %{cp #{this_dir}/data/galaxyenv.sh #{protk_galaxy_envfile}}
+end
+task :galaxy => protk_galaxy_envfile
+task :all => [:tpp,:omssa,:blast,:msgfplus,:pwiz,:openms,:galaxy]

metadata CHANGED Viewed

@@ -1,7 +1,7 @@
 --- !ruby/object:Gem::Specification
 name: protk
 version: !ruby/object:Gem::Version
-  version: 1.1.2
+  version: 1.1.4
   prerelease:
 platform: ruby
 authors:
@@ -9,7 +9,7 @@ authors:
 autorequire:
 bindir: bin
 cert_chain: []
-date: 2012-12-19 00:00:00.000000000 Z
+date: 2013-01-29 00:00:00.000000000 Z
 dependencies:
 - !ruby/object:Gem::Dependency
   name: ftools
@@ -166,6 +166,10 @@ executables:
 - unimod_to_loc.rb
 - generate_omssa_loc.rb
 - uniprot_mapper.rb
+- feature_finder.rb
+- toppas_pipeline.rb
+- gffmerge.rb
+- sixframe.rb
 extensions:
 - ext/protk/extconf.rb
 extra_rdoc_files: []
@@ -178,11 +182,13 @@ files:
 - lib/protk/convert_util.rb
 - lib/protk/data/make_uniprot_table.rb
 - lib/protk/eupathdb_gene_information_table.rb
+- lib/protk/fastadb.rb
 - lib/protk/galaxy_stager.rb
 - lib/protk/galaxy_util.rb
 - lib/protk/manage_db_tool.rb
 - lib/protk/mascot_util.rb
 - lib/protk/omssa_util.rb
+- lib/protk/openms_defaults.rb
 - lib/protk/pepxml.rb
 - lib/protk/plasmodb.rb
 - lib/protk/prophet_tool.rb
@@ -207,6 +213,7 @@ files:
 - bin/feature_finder.rb
 - bin/file_convert.rb
 - bin/generate_omssa_loc.rb
+- bin/gffmerge.rb
 - bin/interprophet.rb
 - bin/libra.rb
 - bin/make_decoy.rb
@@ -220,8 +227,10 @@ files:
 - bin/protein_prophet.rb
 - bin/protk_setup.rb
 - bin/repair_run_summary.rb
+- bin/sixframe.rb
 - bin/tandem_search.rb
 - bin/template_search.rb
+- bin/toppas_pipeline.rb
 - bin/unimod_to_loc.rb
 - bin/uniprot_mapper.rb
 - bin/xls_to_table.rb
@@ -230,7 +239,10 @@ files:
 - lib/protk/data/apt-get_packages.yaml
 - lib/protk/data/brew_packages.yaml
 - lib/protk/data/default_config.yml
+- lib/protk/data/ExecutePipeline.trf
 - lib/protk/data/FeatureFinderCentroided.ini
+- lib/protk/data/FeatureFinderIsotopeWavelet.ini
+- lib/protk/data/galaxyenv.sh
 - lib/protk/data/predefined_db.crap.yaml
 - lib/protk/data/predefined_db.sphuman.yaml
 - lib/protk/data/predefined_db.swissprot_annotation.yaml