RubyGems - bio-phyta - Versions diffs - 0.9.1 → 0.9.2 - Mend

bio-phyta 0.9.1 → 0.9.2

Files changed (9) hide show

data/Gemfile CHANGED

@@ -4,7 +4,7 @@ source "http://rubygems.org"
 # Runtime dependencies
 gem "bio", ">= 1.4.2"
-gem "mysql", ">= 2.8.1"
+gem "mysql2"
 # For JRuby: gem "mysql", "~> 2.8.1"
 gem "sequel", ">= 3.28.0"
 gem "fastercsv", ">= 1.5.4" # only for 1.8.7

data/README.rdoc CHANGED

@@ -1,6 +1,327 @@
 = bio-phyta
-Description goes here.
+PhyTA is a BioRuby program specifically designed for identifying and removing from Expressed Sequence Tag data contaminant sequences from other (non-target) species. PhyTA assigns a higher taxonomic rank to EST sequences based on their BLAST annotation, performs taxonomy-based sequence sorting and constructs a contamination-free sub-library.
+It consists of the following tools:
+[phyta-assign] Is in charge of the higher taxonomic rank sequence annotation.
+[phyta-split] Identifies putative contaminant sequences based on the higher taxonomic rank annotation and user-specified criteria.
+[phyta-extract] Constructs two sub-libraries: a “clean” sub-library that consists of annotated sequences from the target species and a “contaminant” one that includes putative contaminant sequences.
+[phyta-setup-taxonomy-db] Facilitates setting up a local copy of the NCBI taxonomy database.
+The detailed description of these tool's function is provided below.
+All PhyTA scripts are written in Ruby 1.8.7 and are delivered as a Ruby gem. PhyTA has been tested with MRI 1.8.7 and 1.9.2.
+To install PhyTA simply type:
+    gem install bio-phyta
+PhyTA requires Ruby 1.8.7 or higher and a MySQL database. See the "Installation" section for more information.
+== phyta-assign
+phyta-assignparses the NCBI BLASTplus XML format output, assigns a higher taxonomic rank to ESTs based on the BLAST annotation  and stores attributes of BLASTplus and the taxonomy assignments in tabular form as a CSV file.
+To generate an input for phyta-assign, a large set of query sequences is compared to an NCBI database standard stand-alone or network-client BLASTplus programs.
+An example the BLASTplus command for generating input for phyta-assign is:
+    blastx -query Corticium_candelabrum.fasta -db BLASTDB/nr -evalue 0.0001 -max_target_seqs 3 -out Corticium_candelabrum_blast5.xml -outfmt 5
+where the blast5.xml file can be used as an input for phyta-assign.
+The output of phyta-assign contains:
+1. Query sequence ID and the following information for the three top BLAST hits:
+2. accession number
+3. sgi
+4. e-value
+5. species name
+6. subject annotation
+7. Subject score
+8. Higher rank (e.g. Kingdom) taxonomy information
+An example output file could look like this:
+    AW3C1;ACR38454;238014838;3.34982954962278e-19;Zea mays;unknown;78.5665508561758;Viridiplantae
+    AW3C1;XP_002489117;253761439;1.33094019753946e-18;Sorghum bicolor;hypothetical protein SORBIDRAFT_0057s002150;76.6405529765891;Viridiplantae
+    AW3C1;XP_002488963;253760039;1.23820662046332e-15;Sorghum bicolor;hypothetical protein SORBIDRAFT_1150s002010;66.6253640027379;Viridiplantae
+    AW5C3;XP_001629010;156372369;1.85315736381546e-09;Nematostella vectensis;predicted protein;66.2401644268205;Metazoa
+As you see the first three entries are the three best BLAST hits for the query sequence AW3C1. They all get assigned to the Phylum Viridiplantae. The second query sequence only has one hit from the species <i>Nematostella vectensis</i> which gets assigned to the Kingdom Metazoa.
+The higher rank taxonomy is assigned based on species name acquired from the hit gi number and NCBI taxonomy information. The default list of the higher rank taxonomic groups used by phyta-assign is provided in the built-in Taxonomy filter  I. The default taxonomy list and instructions for creating a custom filter are provided below in the section "Custom filters".
+=== Usage
+Phyta-assign takes the following command line arguments:
+[\--input-file, -i] The output of the BLASTplus alignment in XML format
+[\--output-file, -o]   The name of the output table in CSV format
+[\--database-server, -d] Optional: The address of the MySQL database server (default: localhost)
+[\--database-user, -u] Optional: The name of the database user (default: root)
+[\--database-password, -p] Optional: The password of the database user (default: no password)
+[\--database-name, -n] Optional: The name of the NCBI taxonomy database (default: kingdom_assignment_taxonomy)
+[\--filter, -f] A file in YAML format containing a list of the higher rank taxonomic groups The default filter information and instructions for creating your own filters can be found in  the section "Custom filters".
+[\--help, -h] Show a help message
+Here is an example for how phyta-assign is used from the command line:
+    phyta-assign -i Corticium_candelabrum_blast5.xml -o Corticium_candelabrum_blast5_annotated.csv -d localhost -u root -p password -n kingdom_assignment_taxonomy -f default_filter.yaml
+== phyta-split
+Phyta-split takes the CSV file  generated by phyta-assign as input, performs taxonomy-based sorting of the annotated ESTs and outputs two new files in CSV format. One file contains annotations for all ESTs that deemed to belong to the target-species. The second file contains annotations for those sequences  that received three top hits from taxa defined as contaminant by the phyta-split taxonomy filter.
+=== Usage
+[\--input-file, -i ] The output of phyta-assign in CSV format
+[\--output-clean, -c ] The name of the clean output table in CSV format (default: [name_of_input_file]_clean.csv)
+[\--output-contaminated, -d ] The name of the contaminated output table in CSV format (default: [name_of_input_file]_contaminated.csv)
+[\--filter, -f ] A file in YAML format containing a list of taxa to be considered contaminants (default: Use builtin filter capturing Bacteria, Archaea, Viruses and NONE (unidentified species).
+[\--help, -h] Show a help message
+Here is an example for how phyta-split can be used from the command line. Note that no custom filter is used, so only the taxa "Bacteria", "Archaea", "Viruses" and "NONE" will be considered contaminations.
+    phyta-split -i Corticium_candelabrum_blast5_annotated.csv -c Corticium_candelabrum_clean.csv -d Corticium_candelabrum_contaminated.csv
+=== Rules
+Sequences are included into the "clean" target-species sub-library annotation when at least one of their three top BLAST hits does not match any taxa in the phyta-split contamination filter. The default filter provided with the program contains the  following taxonomic groups:  Bacteria, Archaea, Viruses and NONE, which represents {unknown sequences}[http://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?mode=Undef&id=12908&lvl=3&keep=1&srchmode=1&unlock].
+== Custom filters
+Custom filters for phyta-assign and phyta-split can be provided in YAML format. This file can be passed to the corresponding tools as a command line parameter. In order to write a custom filter, it is not necessary to learn the YAML syntax.
+Here's what an example filter looks like:
+    # Filter file for PhyTA 0.9
+    ---
+    - Bacteria
+    - Archaea
+    - Viridiplantae
+    - Rhodophyta
+    - Glaucocystophyceae
+    - Alveolata
+    - Cryptophyta
+    - stramenopiles
+    - Amoebozoa
+    - Apusozoa
+    - Euglenozoa
+    - Fornicata
+    - Haptophyceae
+    - Heterolobosea
+    - Jakobida
+    - Katablepharidophyta
+    - Malawimonadidae
+    - Nucleariidae
+    - Oxymonadida
+    - Parabasalia
+    - Rhizaria
+    - unclassified eukaryotes
+    - Fungi
+    - Metazoa
+    - Choanoflagellida
+    - Opisthokonta incertae sedis
+    - Viruses
+The line starting with a # is a comment and is entirely optional. Just copy the text above into a new plain text file and modify it to your liking. Just make sure to copy the line with the three dashes over as well.
+You can also download this filter directly from https://github.com/PalMuc/bio-phyta/raw/master/misc/default_filter.yaml (right click the link and select "save as").
+== phyta-extract
+Constructs two sub-libraries: a “clean” sub-library that consists of annotated sequences from the target species and a “contaminant” one that includes putative contaminant sequences.
+The output files will be written in FASTA format.
+=== Usage
+[\--fasta, -f] The file containing the sequences  in FASTA format
+[\--input-clean, -c] The name of the clean sequence table in CSV format
+[\--input-contaminated, -d] The name of the contaminated sequence table in CSV format[\--output-clean, -o ] The name of the FASTA file where clean sequences will be written to
+[\--output-contaminated, -p] The name of the FASTA file where contaminated sequences will be written to
+[\--help, -h] Show a help message
+Here is an example for how phyta-extract can be used from the command line.
+    phyta-extract -f Corticium_candelabrum.fasta -c Corticium_candelabrum_clean.csv -d Corticium_candelabrum_contaminated.csv -o Corticium_candelabrum_clean.fasta -p Corticium_candelabrum_contaminated.fasta
+== Installation
+== Prerequisites
+In order to install this gem you need to have several programs
+installed:
+* Ruby either in version 1.8.7 or 1.9.2. JRuby unfortunately is not supported at the moment.
+* Git
+* cURL
+* MySQL
+In the following, the installation procedure is given for <b>Mac OS X</b> and <b>Ubuntu Linux 10.10</b>. The commands for Ubuntu also have been tested to work for <b>Debian Squeeze</b> although you should substitute apt-get by aptitude.
+Please note that in order to use the sudo command, your user account must allowed to acquire root user privileges if this is not the case, please ask your administrator.
+=== Installing Git
+An installer for Mac OS X can be obtained from the [official website](http://git-scm.com/). For any Linux distribution it is recommended that you use your system's package manager to install Git. Look for a package called git or git-core. For Ubuntu 10.10 the command is:
+    sudo apt-get install git
+=== Installing cURL
+Mac OS X comes with curl by default, on a Linux system, cURL can be obtained via the system's package manager. For Ubuntu 10.10 the command is:
+    sudo apt-get install curl
+=== Installing Ruby
+You can find out what version of Ruby comes with your system by typing the following from the command line:
+    ruby -v
+The output of that command looks like that:
+    ruby 1.8.7 (2011-06-30 patchlevel 352) [i686-darwin10.8.0]
+If you have ruby 1.8.7 or higher, you're all set.
+If Ruby is not available on your system or if you have an older version, you should install the most recent version of Ruby.
+The easiest way to install the most recent version of Ruby is via the {Ruby Version Manager}[http://rvm.beginrescueend.com/] (RVM) by Wayne E. Seguin.
+Before you install RVM, make sure you have git and curl installed on your system.
+RVM can be installed by calling:
+    bash < <( curl http://rvm.beginrescueend.com/releases/rvm-install-head )
+This will install RVM to .rvm in your home folder and print several instructions specific to your platform on how to finish the installation. Please pay close attention to the "dependencies" section and look for the part where it says something like this:
+    # For Ruby (MRI & ree)  you should install the following OS dependencies:
+    ruby: /usr/bin/apt-get install build-essential bison openssl libreadline6 libreadline6-dev curl git-core zlib1g zlib1g-dev libssl-dev libyaml-dev libsqlite3-0 libsqlite3-dev sqlite3 libxml2-dev libxslt-dev autoconf libc6-dev ncurses-dev
+It is advisable that you install all of these prerequisites. Please do not copy the commands from this file, look at the output of the RVM installer.
+If installing any of these packages gives you an error, consider updating your packages by using your system's update manager.
+Next, you have to make sure that RVM is loaded when you start a new shell. Look for the part where it says: "You must now complete the install by loading RVM in new shells."
+On Ubuntu 10.10 you can edit your .bashrc by calling:
+    gedit .bashrc
+On Mac OS X, you can type:
+    open -a TextEdit .bash_profile
+At the very end of this file add the following line:
+    [[ -s "$HOME/.rvm/scripts/rvm" ]] && source "$HOME/.rvm/scripts/rvm"  # This loads RVM into a shell session.
+Now save the file, close your editor and close your shell. Start a new shell and type:
+    type rvm | head -1
+If you see something like "rvm is a function" the installation was
+successful. If you run into problems, read the {documentation}[http://rvm.beginrescueend.com/rvm/install/].
+<b>The following command is not part of the installation procedure!</b>
+You can always delete RVM and start from scratch by typing:
+    rvm implode
+Please note that this will delete all versions of Ruby you installed with RVM as well as all of the gems you installed. It will not reverse the changes you made to your shell's load configuration.
+Now you can install Ruby by calling:
+    rvm install 1.9.2
+Please note that everything RVM installs is placed in the folder .rvm in your home directory. Therefore, it is not necessary to use sudo when calling rvm.
+In order to use Ruby instead of your system's Ruby version you must type
+    rvm use 1.9.2
+every time you open a new shell. You can check which version you are currently using with:
+    ruby -v
+If you want to switch back to the version of Ruby that came with your system, type:
+    rvm use system
+In order to use Ruby as the default Ruby implementation on your system you can type:
+    rvm --default use 1.9.2
+Now Ruby 1.9.2 will be called when you type ruby in a new shell.
+=== Installing MySQL
+PhyTA uses a MySQL database in order to store information from the {NCBI taxonomy database}[ftp://ftp.ncbi.nih.gov/pub/taxonomy/] efficiently.
+The database does not have to be hosted on the system that is running PhyTA, but it is advantageous for performance reasons.
+The correct installation procedure for MySQL varies widely among different platforms. For many systems (like Mac OS X) binaries can be obtained from the {official website}[http://www.mysql.com/downloads/mysql/]. In the following, the setup under Ubuntu 10.10 is explained.
+    sudo apt-get install mysql-server
+On Mac OS X, you can install the MySQL preference pane and start the server from there. The MySQL binaries are at /usr/local/mysql/bin/. In order to be able to execute the following examples without having to prefix this path every time, you can add aliases to your bash configuration:
+    open -a TextEdit .bash_profile
+Now add the following lines at the end:
+    alias mysql=/usr/local/mysql/bin/mysql
+    alias mysqladmin=/usr/local/mysql/bin/mysqladmin
+Refer to the ReadMe file that comes with the MySQL installer if you are using tclsh instead of bash.
+=== Starting the database
+Usually the MySQL setup creates an administrator account named "root" with
+an empty password. If your administrator name is different or you have
+set a password, you must adjust the commands in the next section accordingly.
+You can now start MySQL by typing
+    sudo service mysql start
+== phyta-setup-taxonomy-db
+First, you need to set up an empty database for the NCBI taxonomy data. This can be achieved by typing:
+    mysql -u root -p password -e "CREATE DATABASE kingdom_assignment_taxonomy"
+In this example, substitute root for your MySQL username, password for your password and kingdom_assignment_taxonomy for the database name. Leave out the parameter -p if your database does not have a password.
+After that, the program phyta-setup-taxonomy-db will help you set up the NCBI taxonomy database. Its command line options are the following.
+[\--database-server, -d]   Optional: The address of the MySQL database server (default: localhost)
+[\--database-user, -u]   Optional: The name of the database user (default: root)
+[\--database-password, -p]   Optional: The password of the database user (default: no password)
+[\--database-name, -n]   Optional: The name of the NCBI taxonomy database (default: kingdom_assignment_taxonomy)
+[\--help, -h]   Show a help message
+Here is an example command consistent with the example above:
+    phyta-setup-taxonomy-db -d localhost -u root -p password -n kingdom_assignment_taxonomy
+Phyta-setup-taxonomy-db will now download the NCBI taxonomy dump files and load them into your MySQL database. This might take a while.
 == Contributing to bio-phyta
@@ -14,6 +335,12 @@ Description goes here.
 == Copyright
-Copyright (c) 2011 Philipp Comans. See LICENSE.txt for
-further details.
+Copyright (c) 2011 Philipp Comans.
+The MySQL schema used in phyta-setup-taxonomy-db and phyta-assign has been developed by Matthew Horton of the the Department of Ecology and Evolution of the Division of Biological Sciences at the University of Chicago and is available at http://bergelson.uchicago.edu/Members/mhorton/taxonomydb.build .
+See LICENSE.txt for further details.
+== Acknowledgements
+Development of this program was supported by the {Molecular Geo- and Palaeobiology Lab}[http://www.mol-palaeo.de/] of the Department of Earth and Environmental Sciences and the initiative "{Gleichstellung in Forschung und Lehre}[http://www.frauenbeauftragte.uni-muenchen.de/foerdermoegl/lmu1/tg73/index.html]" of the Ludwig-Maximilians-University Munich (LMU).

data/VERSION CHANGED

	@@ -1 +1 @@
1	- 0.9.1
1	+ 0.9.2

data/bin/phyta-assign CHANGED

@@ -30,12 +30,11 @@ if RUBY_PLATFORM =~ /java/
   puts "You are running JRuby, the jdbc/mysql database connector will be used."
   require 'jdbc/mysql'
 else
-  require 'mysql'
+  require 'mysql2'
 end
 require 'sequel'
 require 'nokogiri'
-require 'bio'
 require 'yaml'
 require 'csv'

data/bin/phyta-setup-taxonomy-db CHANGED

@@ -20,7 +20,7 @@ unless opts[:database_password_given]
 end
 #Connect to the target database
-connect_string = 'mysql://'+ opts[:database_server] + '/' + opts[:database_name] + '?user=' + opts[:database_user]
+connect_string = 'mysql2://'+ opts[:database_server] + '/' + opts[:database_name] + '?user=' + opts[:database_user]
 if !opts[:database_password].nil?
   connect_string = connect_string + '&password=' + opts[:database_password]
@@ -31,7 +31,7 @@ if RUBY_PLATFORM =~ /java/
   require 'jdbc/mysql'
   connect_string = 'jdbc:' + connect_string
 else
-  require 'mysql'
+  require 'mysql2'
 end
 PROTEIN_TABLE_NAME = 'proteinGiToTaxonId'
@@ -54,7 +54,7 @@ ftp.login
 files = ftp.chdir('pub/taxonomy/')
 #Do the following in a temporary directory, automatically delete it afterwards
-Dir.mktmpdir do |dir|
+Dir.mktmpdir() do |dir|
   Dir.chdir(dir)
   tax_dmp = 'taxdump.tar.gz'
@@ -119,11 +119,11 @@ database.run "TRUNCATE #{NAMES_TABLE_NAME}"
 database.run "TRUNCATE #{NODES_TABLE_NAME}"
 database.run "TRUNCATE #{PROTEIN_TABLE_NAME}"
-database.run "LOAD DATA INFILE '#{dir}/gi_taxid_prot.dmp' INTO TABLE #{PROTEIN_TABLE_NAME} FIELDS TERMINATED BY '\t' LINES TERMINATED BY '\n' (gi,taxonid);"
+database.run "LOAD DATA LOCAL INFILE '#{dir}/gi_taxid_prot.dmp' INTO TABLE #{PROTEIN_TABLE_NAME} FIELDS TERMINATED BY '\t' LINES TERMINATED BY '\n' (gi,taxonid);"
-database.run "LOAD DATA INFILE '#{dir}/names.dmp' INTO TABLE #{NAMES_TABLE_NAME} FIELDS TERMINATED BY '\t|\t' LINES TERMINATED BY '\t|\n' (taxonid, name, uniquename, class);"
+database.run "LOAD DATA LOCAL INFILE '#{dir}/names.dmp' INTO TABLE #{NAMES_TABLE_NAME} FIELDS TERMINATED BY '\t|\t' LINES TERMINATED BY '\t|\n' (taxonid, name, uniquename, class);"
-database.run "LOAD DATA INFILE '#{dir}/nodes.dmp' INTO TABLE #{NODES_TABLE_NAME} FIELDS TERMINATED BY '\t|\t' LINES TERMINATED BY '\t|\n' (taxonid, parenttaxonid,rank,embl_code,division_id,inherited_div_flag,genetic_code_id,inherited_gc_flag, mitochondrial_genetic_codeid,inherited_mgc_flag,genBank_hidden_flag,hidden_subtree_root_flag,comments);"
+database.run "LOAD DATA LOCAL INFILE '#{dir}/nodes.dmp' INTO TABLE #{NODES_TABLE_NAME} FIELDS TERMINATED BY '\t|\t' LINES TERMINATED BY '\t|\n' (taxonid, parenttaxonid,rank,embl_code,division_id,inherited_div_flag,genetic_code_id,inherited_gc_flag, mitochondrial_genetic_codeid,inherited_mgc_flag,genBank_hidden_flag,hidden_subtree_root_flag,comments);"
 end

data/lib/blast_string_parser.rb CHANGED

@@ -1,6 +1,3 @@
-# To change this template, choose Tools | Templates
-# and open the template in the editor.
 class BlastStringParser
   def initialize

data/lib/kingdom_db.rb CHANGED

@@ -13,7 +13,7 @@ class KingdomDB
   def initialize(server, user, password, database)
-    connect_string = 'mysql://'+ server + '/' + database + '?user=' + user
+    connect_string = 'mysql2://'+ server + '/' + database + '?user=' + user
     if !password.nil?
       connect_string = connect_string + '&password=' + password

data/misc/default_filter.yaml ADDED

@@ -0,0 +1,29 @@
+# Filter file for PhyTA 0.9
+---
+- Bacteria
+- Archaea
+- Viridiplantae
+- Rhodophyta
+- Glaucocystophyceae
+- Alveolata
+- Cryptophyta
+- stramenopiles
+- Amoebozoa
+- Apusozoa
+- Euglenozoa
+- Fornicata
+- Haptophyceae
+- Heterolobosea
+- Jakobida
+- Katablepharidophyta
+- Malawimonadidae
+- Nucleariidae
+- Oxymonadida
+- Parabasalia
+- Rhizaria
+- unclassified eukaryotes
+- Fungi
+- Metazoa
+- Choanoflagellida
+- Opisthokonta incertae sedis
+- Viruses

metadata CHANGED

@@ -1,138 +1,189 @@
---- !ruby/object:Gem::Specification
+--- !ruby/object:Gem::Specification
 name: bio-phyta
-version: !ruby/object:Gem::Version
-  version: 0.9.1
+version: !ruby/object:Gem::Version
+  hash: 63
   prerelease:
+  segments:
+  - 0
+  - 9
+  - 2
+  version: 0.9.2
 platform: ruby
-authors:
+authors:
 - Philipp Comans
 autorequire:
 bindir: bin
 cert_chain: []
-date: 2011-10-21 00:00:00.000000000Z
-dependencies:
-- !ruby/object:Gem::Dependency
-  name: bio
-  requirement: &2153022740 !ruby/object:Gem::Requirement
+date: 2011-11-30 00:00:00 Z
+dependencies:
+- !ruby/object:Gem::Dependency
+  requirement: &id001 !ruby/object:Gem::Requirement
     none: false
-    requirements:
-    - - ! '>='
-      - !ruby/object:Gem::Version
+    requirements:
+    - - ">="
+      - !ruby/object:Gem::Version
+        hash: 3
+        segments:
+        - 1
+        - 4
+        - 2
         version: 1.4.2
-  type: :runtime
+  version_requirements: *id001
+  name: bio
   prerelease: false
-  version_requirements: *2153022740
-- !ruby/object:Gem::Dependency
-  name: mysql
-  requirement: &2153022260 !ruby/object:Gem::Requirement
-    none: false
-    requirements:
-    - - ! '>='
-      - !ruby/object:Gem::Version
-        version: 2.8.1
   type: :runtime
+- !ruby/object:Gem::Dependency
+  requirement: &id002 !ruby/object:Gem::Requirement
+    none: false
+    requirements:
+    - - ">="
+      - !ruby/object:Gem::Version
+        hash: 3
+        segments:
+        - 0
+        version: "0"
+  version_requirements: *id002
+  name: mysql2
   prerelease: false
-  version_requirements: *2153022260
-- !ruby/object:Gem::Dependency
-  name: sequel
-  requirement: &2153021780 !ruby/object:Gem::Requirement
+  type: :runtime
+- !ruby/object:Gem::Dependency
+  requirement: &id003 !ruby/object:Gem::Requirement
     none: false
-    requirements:
-    - - ! '>='
-      - !ruby/object:Gem::Version
+    requirements:
+    - - ">="
+      - !ruby/object:Gem::Version
+        hash: 119
+        segments:
+        - 3
+        - 28
+        - 0
         version: 3.28.0
-  type: :runtime
+  version_requirements: *id003
+  name: sequel
   prerelease: false
-  version_requirements: *2153021780
-- !ruby/object:Gem::Dependency
-  name: fastercsv
-  requirement: &2153021300 !ruby/object:Gem::Requirement
+  type: :runtime
+- !ruby/object:Gem::Dependency
+  requirement: &id004 !ruby/object:Gem::Requirement
     none: false
-    requirements:
-    - - ! '>='
-      - !ruby/object:Gem::Version
+    requirements:
+    - - ">="
+      - !ruby/object:Gem::Version
+        hash: 11
+        segments:
+        - 1
+        - 5
+        - 4
         version: 1.5.4
-  type: :runtime
+  version_requirements: *id004
+  name: fastercsv
   prerelease: false
-  version_requirements: *2153021300
-- !ruby/object:Gem::Dependency
-  name: nokogiri
-  requirement: &2153020820 !ruby/object:Gem::Requirement
+  type: :runtime
+- !ruby/object:Gem::Dependency
+  requirement: &id005 !ruby/object:Gem::Requirement
     none: false
-    requirements:
-    - - ! '>='
-      - !ruby/object:Gem::Version
+    requirements:
+    - - ">="
+      - !ruby/object:Gem::Version
+        hash: 3
+        segments:
+        - 1
+        - 5
+        - 0
         version: 1.5.0
-  type: :runtime
+  version_requirements: *id005
+  name: nokogiri
   prerelease: false
-  version_requirements: *2153020820
-- !ruby/object:Gem::Dependency
-  name: trollop
-  requirement: &2153020340 !ruby/object:Gem::Requirement
+  type: :runtime
+- !ruby/object:Gem::Dependency
+  requirement: &id006 !ruby/object:Gem::Requirement
     none: false
-    requirements:
-    - - ! '>='
-      - !ruby/object:Gem::Version
+    requirements:
+    - - ">="
+      - !ruby/object:Gem::Version
+        hash: 83
+        segments:
+        - 1
+        - 16
+        - 2
         version: 1.16.2
-  type: :runtime
+  version_requirements: *id006
+  name: trollop
   prerelease: false
-  version_requirements: *2153020340
-- !ruby/object:Gem::Dependency
-  name: shoulda
-  requirement: &2153019860 !ruby/object:Gem::Requirement
+  type: :runtime
+- !ruby/object:Gem::Dependency
+  requirement: &id007 !ruby/object:Gem::Requirement
     none: false
-    requirements:
-    - - ! '>='
-      - !ruby/object:Gem::Version
-        version: '0'
-  type: :development
+    requirements:
+    - - ">="
+      - !ruby/object:Gem::Version
+        hash: 3
+        segments:
+        - 0
+        version: "0"
+  version_requirements: *id007
+  name: shoulda
   prerelease: false
-  version_requirements: *2153019860
-- !ruby/object:Gem::Dependency
-  name: bundler
-  requirement: &2153019380 !ruby/object:Gem::Requirement
+  type: :development
+- !ruby/object:Gem::Dependency
+  requirement: &id008 !ruby/object:Gem::Requirement
     none: false
-    requirements:
+    requirements:
     - - ~>
-      - !ruby/object:Gem::Version
+      - !ruby/object:Gem::Version
+        hash: 23
+        segments:
+        - 1
+        - 0
+        - 0
         version: 1.0.0
-  type: :development
+  version_requirements: *id008
+  name: bundler
   prerelease: false
-  version_requirements: *2153019380
-- !ruby/object:Gem::Dependency
-  name: jeweler
-  requirement: &2153018900 !ruby/object:Gem::Requirement
+  type: :development
+- !ruby/object:Gem::Dependency
+  requirement: &id009 !ruby/object:Gem::Requirement
     none: false
-    requirements:
+    requirements:
     - - ~>
-      - !ruby/object:Gem::Version
+      - !ruby/object:Gem::Version
+        hash: 7
+        segments:
+        - 1
+        - 6
+        - 4
         version: 1.6.4
-  type: :development
+  version_requirements: *id009
+  name: jeweler
   prerelease: false
-  version_requirements: *2153018900
-- !ruby/object:Gem::Dependency
-  name: rcov
-  requirement: &2153018420 !ruby/object:Gem::Requirement
-    none: false
-    requirements:
-    - - ! '>='
-      - !ruby/object:Gem::Version
-        version: '0'
   type: :development
+- !ruby/object:Gem::Dependency
+  requirement: &id010 !ruby/object:Gem::Requirement
+    none: false
+    requirements:
+    - - ">="
+      - !ruby/object:Gem::Version
+        hash: 3
+        segments:
+        - 0
+        version: "0"
+  version_requirements: *id010
+  name: rcov
   prerelease: false
-  version_requirements: *2153018420
+  type: :development
 description: Pipeline to remove contaminations from EST libraries
 email: philipp.comans@googlemail.com
-executables:
+executables:
+- phyta-split
 - phyta-assign
 - phyta-extract
 - phyta-setup-taxonomy-db
-- phyta-split
 extensions: []
-extra_rdoc_files:
+extra_rdoc_files:
 - LICENSE.txt
 - README.rdoc
-files:
+files:
 - .document
 - Gemfile
 - LICENSE.txt
@@ -145,6 +196,7 @@ files:
 - bin/phyta-split
 - lib/blast_string_parser.rb
 - lib/kingdom_db.rb
+- misc/default_filter.yaml
 - test/helper.rb
 - test/test_blackbox_assign.rb
 - test/test_blackbox_extract.rb
@@ -152,31 +204,37 @@ files:
 - test/test_blast_string_parser.rb
 - test/test_kingdom_db.rb
 homepage: https://github.com/PalMuc/bio-phyta
-licenses:
+licenses:
 - LGPL
 post_install_message:
 rdoc_options: []
-require_paths:
+require_paths:
 - lib
-required_ruby_version: !ruby/object:Gem::Requirement
+required_ruby_version: !ruby/object:Gem::Requirement
   none: false
-  requirements:
-  - - ! '>='
-    - !ruby/object:Gem::Version
-      version: '0'
-      segments:
+  requirements:
+  - - ">="
+    - !ruby/object:Gem::Version
+      hash: 3
+      segments:
       - 0
-      hash: -3130547697683155421
-required_rubygems_version: !ruby/object:Gem::Requirement
+      version: "0"
+required_rubygems_version: !ruby/object:Gem::Requirement
   none: false
-  requirements:
-  - - ! '>='
-    - !ruby/object:Gem::Version
-      version: '0'
+  requirements:
+  - - ">="
+    - !ruby/object:Gem::Version
+      hash: 3
+      segments:
+      - 0
+      version: "0"
 requirements: []
 rubyforge_project:
 rubygems_version: 1.8.10
 signing_key:
 specification_version: 3
 summary: Pipeline to remove contaminations from EST libraries
 test_files: []