RubyGems - snp-search - Versions diffs - 2.2.0 → 2.3.0 - Mend

snp-search 2.2.0 → 2.3.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (21) hide show

data/Gemfile +1 -2
data/Gemfile.lock +2 -3
data/README +0 -105
data/README.rdoc +35 -29
data/Rakefile +2 -2
data/VERSION +1 -1
data/bin/snp-search +174 -261
data/lib/create_methods.rb +196 -0
data/lib/filter_ignore_snps_methods.rb +130 -0
data/lib/information_methods.rb +117 -0
data/lib/output_information_methods.rb +131 -0
data/lib/snp-search.rb +18 -280
data/lib/snp_db_connection.rb +1 -2
data/lib/snp_db_models.rb +3 -3
data/lib/snp_db_schema.rb +119 -80
data/pkg/snp-search-1.1.0.gem +0 -0
data/pkg/snp-search-1.2.0.gem +0 -0
data/pkg/snp-search-2.3.0.gem +0 -0
data/snp-search.gemspec +15 -12
metadata +73 -33
data/.rspec +0 -1

data/Gemfile CHANGED Viewed

@@ -5,10 +5,9 @@ source "http://rubygems.org"
  gem "activerecord", "~> 3.1.3"
  gem "bio", "~> 1.4.2"
- gem "slop", "~> 3.3.1"
+ gem "slop", "~> 2.4.0"
  gem 'sqlite3', "~> 1.3.4"
  gem 'activerecord-import', "~> 0.2.8"
- gem "diff-lcs", "~> 1.1.3"
 # Add dependencies to develop your gem here.
 # Include everything needed to run rake, tests, features, etc.

data/Gemfile.lock CHANGED Viewed

@@ -36,7 +36,7 @@ GEM
     rspec-expectations (2.3.0)
       diff-lcs (~> 1.1.2)
     rspec-mocks (2.3.0)
-    slop (3.3.1)
+    slop (2.4.0)
     sqlite3 (1.3.4)
     tzinfo (0.3.31)
@@ -48,9 +48,8 @@ DEPENDENCIES
   activerecord-import (~> 0.2.8)
   bio (~> 1.4.2)
   bundler (~> 1.0.0)
-  diff-lcs (~> 1.1.3)
   jeweler (~> 1.6.4)
   rcov
   rspec (~> 2.3.0)
-  slop (~> 3.3.1)
+  slop (~> 2.4.0)
   sqlite3 (~> 1.3.4)

data/README CHANGED Viewed

@@ -1,105 +0,0 @@
-= snp-search
-SNPsearch is a tool that manages SNP data and allows for data importing, manipulating, editing and complex querying of SNP data.  It can be used to evaluate the utility of SNPs for the assessment of genetic diversity between haploid strains and the management of genotype and phenotype data.  Once the database is created, the user is provided with several query and output options. SNPsearch is particularly useful in the analysis of phylogenetic trees that are based on SNP differences across whole core genomes.  Queries can be made to answer critical genomic questions such as the association of SNPs with particular phenotypes.
-== Obtaining and installing the code
-SNPsearch is written in Ruby and operates in a Unix environment.  It is made available as a gem. See the github site for more information (https://github.com/hpa-bioinformatics/snp-search).
-To install snp-search, do
-  gem install snp-search
-== Requirements
-Not much, you just need:
-* Unix. Once snp-search is installed, all the necessary gems to run snp-search will also be installed from Rubygems (note that Rubygems requires admin privileges.  If you do not have admin privileges then we suggest you install RVM: (http://beginrescueend.com/rvm/install/) and then gem install snp-search).
-* ruby version 1.8.7 and above.
-* Optional: FastTree.  If you require a tree output in Newick format, you must install FastTree from http://www.microbesonline.org/fasttree/#Install.  You must specify the path of the executable in your .bashrc or .profile file as snp-search will run the command as just 'FastTree' and will not know where FastTree is if it is not specified in your .bashrc or .profile file.
-Thats it!
-== Running snp-search
-1- Creating the database (snp-search -create)
-  Two files are needed to create the SQLite3 database:
-  1- Variant Call Format (.vcf) file (which contains the SNP information)
-  2- Your database reference genome that you used to generate your .vcf file (in genbank or embl format, the script will automatically detect the format).
-You need the following parameters:
-  -n	Name of your database
-  -v	.vcf file
-  -d	Database Reference genome (The same file that was used in generating the .vcf file).  This should be in genbank or embl format.
-  Other options:
-  -c	SNP quality score cutoff.  A Phred-scaled quality score. High quality scores indicate high confidence calls. Optional, default = 90 (out of 100)
-  -g	Genotype Quality score cutoff. Phred-scaled quality score that the genotype is true.	Optional, default = 30
-  -h	help message
-  Usage:
-    snp-search -create -n my_snp_db.sqlite3 -d my_ref.gbk -v my_vcf_file.vcf
-  Note: The strain names in your database will be taken from your vcf file so make sure they are named appropriately in your vcf file.
-2- Querying the Database (snp-search -query)
-  Two queries are currently scripted in SNPsearch:
-  1- unique_snps: This option queries the database and selects the number of unique SNPs within the list of the strains/samples provided.  The output is the number of unique SNPs.
-  You need the following parameters:
-  -n  Name of your database
-  -s  The strains/samples you like to query
-  Usage:
-    snp-search -n my_snp_db.sqlite3 -s list_of_my_strains.txt
-  2- not_include_snps_from_gene: This option queries the database to select only those SNPs not found in a specified gene. These SNPs are used to make a concatenated SNP multiple alignment file (FASTA format).  This is a way of removing a set of genes (likely to be mobile element genes) that are not needed for SNP analysis.  The user has the option of generating a core SNP tree Newick file for SNP phylogeny.
-  You need the following parameters:
-  -n  Name of your database
-  -a  The gene you like to remove from analysis
-  -o  Output file, in fasta format
-  options:
-  -t  Generate SNP phylogeny
-  -w  Output tree in Newick format
-  Usage (phage is used as the example gene):
-  snp-search -n my_snp_db.sqlite3 -a phage -o snps_sequences_without_phage.fasta -t -w snps_sequences_without_phage.nwk
-  The algorithm FastTree is used to generate the nwk file.  FastTree can be downloaded from http://www.microbesonline.org/fasttree/#Install (see above)
-  3- Output database (snp-search -out_file)
-  You need the following parameters:
-  -n  Name of your database
-  -o  Output file containing the database in fasta format
-== View database in Unix or in a GUI
-Your database will be in sqlite3 format.  If you like to view your table(s) and perform direct queries you can type
-  sqlite3 snp_db.sqlite3
-Alternatively, you may download a SQL tool to view your database (e.g. SQLite sorcerer).
-== Contact
-If you have any comments, questions or suggestions, please email
-  ali.al-shahib@hpa.org.uk
-or
-  anthony.underwood@hpa.org.uk
-Have fun snp-searching!
-== Copyright
-Copyright (c) 2012 Ali Al-Shahib. See LICENSE.txt for
-further details.

data/README.rdoc CHANGED Viewed

@@ -21,17 +21,17 @@ Thats it!
 == Running snp-search
-1- Creating the database (snp-search -create)
+1- The first thing you need to do is to create the database (snp-search -create)
   Two files are needed to create the SQLite3 database:
-  1- Variant Call Format (.vcf) file (which contains the SNP information)
+  1A- Variant Call Format (.vcf) file (which contains the SNP information)
-  2- Your database reference genome that you used to generate your .vcf file (in genbank or embl format, the script will automatically detect the format).
+  1B- Your database reference genome that you used to generate your .vcf file (in genbank or embl format, the script will automatically detect the format).
 You need the following parameters:
-  -n	Name of your database
+  -n	Name of your database (note that this is a required field in all commands).
   -v	.vcf file
   -d	Database Reference genome (The same file that was used in generating the .vcf file).  This should be in genbank or embl format.
@@ -45,43 +45,49 @@ You need the following parameters:
   Note: The strain names in your database will be taken from your vcf file so make sure they are named appropriately in your vcf file.
-2- Querying the Database (snp-search -query)
+2- Now that you have created the database (my_snp_db.sqlite3) you can use snp-search to output several queried data.
-  Two queries are currently scripted in SNPsearch:
+  2A- First, you should choose which output format you like:
+    -f, --fasta: output fasta file format (not available with -unique_snps option)
+    -T, --tabular: output tabular file format
-  1- unique_snps: This option queries the database and selects the number of unique SNPs within the list of the strains/samples provided.  The output is the number of unique SNPs.
+  2B- Next, you need to tell snp-search what you want out.  You have several options:
+    - Querying the Database to select the number of unique SNPs within the list of the strains/samples provided (list_of_my_strains.txt). The output is a text file with a list of the unique SNPs and information about each SNP (e.g. if its synonymous or non-synonymous SNP).
-  You need the following parameters:
+  -u, --unique_snps                      Query for unique snps in the database (only used with -tabular option)
+  -s, --strain                           The strains/samples you like to query (only used with -unique_snps flag)
+  Usage:
+  snp-search -n my_snp_db.sqlite3 -O -T -u -n my_snp_db.sqlite3 -s list_of_my_strains.txt -o unique_snps.out
-  -n  Name of your database
-  -s  The strains/samples you like to query
+  - Querying the database to output all SNPs without specified features in the database (e.g. phages).  This is a way of removing a set of genes (likely to be mobile element genes) that are not needed for SNP analysis.  The user has the option of generating a core SNP tree Newick file for SNP phylogeny (if -F option was used to ouput fasta file).
-  Usage:
-    snp-search -n my_snp_db.sqlite3 -s list_of_my_strains.txt
-  2- not_include_snps_from_gene: This option queries the database to select only those SNPs not found in a specified gene. These SNPs are used to make a concatenated SNP multiple alignment file (FASTA format).  This is a way of removing a set of genes (likely to be mobile element genes) that are not needed for SNP analysis.  The user has the option of generating a core SNP tree Newick file for SNP phylogeny.
-  You need the following parameters:
+  -e, --ignore_snps_from_feature         Ignore SNPs from specified features in the database
+  -r, --remove_non_informative_snps      Only output informative SNPs
+  -I, --ignore_snps_in_range             A list of position ranges to ignore e.g 10..500,2000..2500
+  -R, --ignore_strains                   A list of strains to ignore (seperate by comma e.g. S1,S4,S8 )
+  -a, --annotation                       The name of the gene to ignore (only used with the -ignore_snps_from_feature flag)
+  -o, --out                              Name of output file
-  -n  Name of your database
-  -a  The gene you like to remove from analysis
-  -o  Output file, in fasta format
+  Usage:
+  snp-search -O -F -e -n my_snp_db.sqlite3 -a phage,insertion,transposon -r -o snps_without_phages.fasta
-  options:
+  - Optionally, you can add the following options to generate a phylogenetic tree from the resulting fasta file:
   -t  Generate SNP phylogeny
   -w  Output tree in Newick format
-  Usage (phage is used as the example gene):
-  snp-search -n my_snp_db.sqlite3 -a phage -o snps_sequences_without_phage.fasta -t -w snps_sequences_without_phage.nwk
+  Usage:
+  snp-search -O -F -e -n my_snp_db.sqlite3 -a phage,insertion,transposon -r -t -w -o snps_without_phages.fasta
   The algorithm FastTree is used to generate the nwk file.  FastTree can be downloaded from http://www.microbesonline.org/fasttree/#Install (see above)
-  3- Output database (snp-search -out_file)
+  - Output all SNPs with information.  Information for each SNP includes whether the SNP is synonymous or non-synonymous, gene function, whether it is a pseudogene and other useful information.  These information will be tab-seperated.
-  You need the following parameters:
-  -n  Name of your database
-  -o  Output file containing the database in fasta format
+  -E, --info                             Output various information about SNPs
+  -o, --out                              Name of output file
+  Usage:
+  snp-search -O -T -E -n my_snp_db.sqlite3 o snps_all_with_info.txt
 == View database in Unix or in a GUI
 Your database will be in sqlite3 format.  If you like to view your table(s) and perform direct queries you can type

data/Rakefile CHANGED Viewed

@@ -15,11 +15,11 @@ require 'jeweler'
 Jeweler::Tasks.new do |gem|
   # gem is a Gem::Specification... see http://docs.rubygems.org/read/chapter/20 for more options
   gem.name = "snp-search"
-  gem.homepage = "http://github.com/hpa-bioinformatics/snp-search"
+  gem.homepage = "http://github.com/phe-bioinformatics/snp-search"
   gem.license = "MIT"
   gem.summary = %Q{Tool for generating SNP database}
   gem.description = %Q{Use the snp-search tool to create, import, manipulate and query your SNP database}
-  gem.email = "ali.al-shahib@hpa.org.uk"
+  gem.email = "ali.al-shahib@phe.gov.uk"
   gem.authors = ["Ali Al-Shahib", "Anthony Underwood"]
   gem.executables = ["snp-search"]
   # dependencies defined in Gemfile

data/VERSION CHANGED Viewed

	@@ -1 +1 @@
1	- 2.2.0
1	+ 2.3.0

data/bin/snp-search CHANGED Viewed

@@ -1,329 +1,242 @@
 require 'snp-search'
-require 'snp_db_connection'
-require 'snp_db_models'
-require 'snp_db_schema'
+require '../lib/snp_db_connection.rb'
+require '../lib/snp_db_models.rb'
+require '../lib/snp_db_schema.rb'
+require '../lib/output_information_methods.rb'
 require 'activerecord-import'
 require 'slop'
 opts = Slop.parse do
-  banner "\nruby snp-search [-create] [-query] [-output] [-n <sqlite3>] [options]*"
+  banner "\nruby snp-search [-create] [-output] [-n <sqlite3>] [options]*"
   separator ''
   on :C, :create, 'Create database'
-  on :Q, :query, 'Query database'
-  on :O, :output, 'Output options'
-  separator ''
-  # separator 'README file: https://github.com/hpa-bioinformatics/snp-search/blob/master/README.rdoc'
-  # separator 'The following command must be used when using -create, or -query or -out_file'
-  on :n, :name=, 'Name of database, Required'
+  on :O, :output, 'Output a process'
+  # separator ''
+  # # separator 'README file: https://github.com/hpa-bioinformatics/snp-search/blob/master/README.rdoc'
+  # # separator 'The following command must be used when using -create, or -query or -out_file'
+  # on :n, :name=, 'Name of database, Required'
   separator ''
-  separator '-create options'
+  separator '-create [options]'
   on :d, :database_reference_file=, 'Reference genome file, in gbk or embl file format, Required', true
   on :v, :vcf_file=, 'variant call format (vcf) file, Required', true
-  on :c, :cuttoff_snp=, 'SNP quality cutoff, (default = 90)', :as => :int, :default => 90
+  on :n, :name=, 'Name of database, Required'
+  on :A, :cuttoff_ad=, 'AD ratio cutoff (default 0.9)', :as => :int, :default => 0.9
+  separator ''
+  separator '-output -snps_from_feature -n db_name [options] [-fasta] [-tabular]'
+  on :F, :fasta, 'output fasta file format'
+  on :T, :tabular, 'output tabular file format'
+  on :c, :cuttoff_snp_qual=, 'SNP quality cutoff, (default = 90)', :as => :int, :default => 90
   on :g, :cuttoff_genotype=, 'Genotype quality cutoff (default = 30)', :as => :int,  :default => 30
+  on :S, :snps_from_feature, 'SNPs from specified features in the database (if you do not want to ignore any SNPs, just use this option with -n -F/T -o)'
+  on :r, :remove_non_informative_snps, 'Only output informative SNPs. Only used with -e option'
+  on :e, :ignore_snps_in_range=, 'A list of position ranges to ignore e.g 10..500,2000..2500. Only used with -e option'
+  on :R, :ignore_strains=, 'A list of strains to ignore (seperate by comma e.g. S1,S4,S8 ). Only used with -e option'
+  on :I, :ignore_snps_on_annotation=, 'The name of the feature to ignore.'
+  on :o, :out=, 'Name of output file, Required'
+  on :t, :tree, 'Generate SNP phylogeny (only used with -fasta option)'
+  on :p, :fasttree_path=, 'Full path to the FastTree tool (e.g. /usr/local/bin/FastTree. only used with -tree option)'
   separator ''
-  separator '-query options'
+  separator '-output -unique_snps -n db_name [-fasta] [-tabular] [options]'
+  on :c, :cuttoff_snp_qual=, 'SNP quality cutoff, (default = 90)', :as => :int, :default => 90
+  on :g, :cuttoff_genotype=, 'Genotype quality cutoff (default = 30)', :as => :int,  :default => 30
   on :u, :unique_snps, 'Query for unique snps in the database'
-  on :r, :not_include_snps_from_gene, 'Remove SNPs from specified gene from database'
-  on :s, :strain=, 'The strains/samples you like to query, Required'
-  on :a, :annotation=, 'The gene you like to remove from analysis'
+  on :s, :strain=, 'The strains/samples you like to query (only used with -unique_snps flag)'
+  on :o, :out=, 'Name of output file, Required'
   separator ''
-  separator '-output [-fasta] [-syn] options'
-  on :f, :fasta, 'output fasta file'
-  on :S, :syn, 'output tab-delimited file with synonymous and non-synonymous info'
-  on :o, :out=, 'Name of output file'
-  on :t, :tree, 'Generate SNP phylogeny'
-  on :w, :nwk_out=, 'Name of output tree in Newick format'
+  separator '-output -info -n db_name [-fasta] [-tabular] [options]'
+  on :i, :info, 'Output various information about SNPs'
+  on :c, :cuttoff_snp_qual=, 'SNP quality cutoff, (default = 90)', :as => :int, :default => 90
+  on :g, :cuttoff_genotype=, 'Genotype quality cutoff (default = 30)', :as => :int,  :default => 30
+  on :t, :tree, 'Generate SNP phylogeny (only used with -fasta option)'
+  on :w, :nwk_out=, 'Name of output tree in Newick format (only used with -tree option)'
+  on :o, :out=, 'Name of output file, Required'
 end
-# opts.end
 ###########################################################
 # CREATING A DATABASE
 if opts[:create]
-    # puts opts[:cuttoff_snp].to_i
-      error_msg = ""
-      error_msg += "-n: \t Name of your database\n" unless opts[:name]
-      error_msg += "-d: \t Reference genome file, in gbk or embl file format\n" unless opts[:database_reference_file]
-      error_msg += "-v: \t .vcf file\n" unless opts[:vcf_file]
-      error_msg_optional = ""
-      error_msg_optional += "-c: \tSNP quality cutoff, (default = 90)\n"
-      error_msg_optional += "-g: \tGenotype quality cutoff (default = 30)\n"
-        unless error_msg == ""
-          puts "Please provide the following required fields:"
-          puts error_msg
-          puts "Optional fields:"
-          puts error_msg_optional
-          puts opts.help unless opts.empty?
-          exit
-        end
-      abort "#{opts[:database_reference_file]} file does not exist!" unless File.exist?(opts[:database_reference_file])
-      abort "#{opts[:vcf_file]} file does not exist!" unless File.exist?(opts[:vcf_file])
-    # Name of your database
-    establish_connection(opts[:name])
-    # Schema will run here
-   db_schema
-    ref = opts[:database_reference_file]
-    sequence_format = guess_sequence_format(ref)
-          case sequence_format
-          when :genbank
-            sequence_flatfile = Bio::FlatFile.open(Bio::GenBank,opts[:database_reference_file]).next_entry
-          when :embl
-            sequence_flatfile = Bio::FlatFile.open(Bio::EMBL,opts[:database_reference_file]).next_entry
-          else
-            puts "All sequence files should be in genbank or embl format"
-            exit
-          end
+  # puts opts[:cuttoff_snp_qual].to_i
+    error_msg = ""
-      # path for vcf file here
-      vcf_mpileup_file = opts[:vcf_file]
+    error_msg += "-n: \t Name of your database\n" unless opts[:name]
+    error_msg += "-d: \t Reference genome file, in gbk or embl file format\n" unless opts[:database_reference_file]
+    error_msg += "-v: \t .vcf file\n" unless opts[:vcf_file]
-      # The populate_features_and_annotations method populates the features and annotations.  It uses the embl/gbk file.
-     populate_features_and_annotations(sequence_flatfile)
+    error_msg_optional = ""
-      #The populate_snps_alleles_genotypes method populates the snps, alleles and genotypes.  It uses the vcf file, and if specified, the SNP quality cutoff and genotype quality cutoff
+    error_msg_optional += "-c: \tSNP quality cutoff, (default = 90)\n"
+    error_msg_optional += "-g: \tGenotype quality cutoff (default = 30)\n"
+      unless error_msg == ""
+        puts "Please provide the following required fields:"
+        puts error_msg
+        puts "Optional fields:"
+        puts error_msg_optional
+        puts opts.help unless opts.empty?
+        exit
+      end
+    abort "#{opts[:database_reference_file]} file does not exist!" unless File.exist?(opts[:database_reference_file])
+    abort "#{opts[:vcf_file]} file does not exist!" unless File.exist?(opts[:vcf_file])
-      populate_snps_alleles_genotypes(vcf_mpileup_file, opts[:cuttoff_snp], opts[:cuttoff_genotype])
-      # puts "populate_snps_alleles_genotypes(#{vcf_mpileup_file}, #{opts[:cuttoff_snp]}, #{opts[:cuttoff_genotype]}.to_i)"
+  # Name of your database
+  establish_connection(opts[:name])
-###########################################################
+  # Schema will run here
+  db_schema
-# QUERYING THE DATABASE
-elsif opts [:query]
-  #FIND UNIQUE SNPS
-  if opts[:unique_snps]
+  ref = opts[:database_reference_file]
-        error_msg = ""
+  sequence_format = guess_sequence_format(ref)
-        error_msg += "-n: \t Name of your database\n" unless opts[:name]
-        error_msg += "-s: \t List of strains you like to query\n" unless opts[:strain]
-        unless error_msg == ""
-          puts "Please provide the following required fields:"
-          puts error_msg
-          puts opts.help unless opts.empty?
+        case sequence_format
+        when :genbank
+          sequence_flatfile = Bio::FlatFile.open(Bio::GenBank,opts[:database_reference_file]).next_entry
+        when :embl
+          sequence_flatfile = Bio::FlatFile.open(Bio::EMBL,opts[:database_reference_file]).next_entry
+        else
+          puts "All sequence files should be in genbank or embl format"
           exit
         end
-        abort "#{opts[:name]} database does not exist!" unless File.exist?(opts[:name])
-        abort "#{opts[:strain]} file does not exist!" unless File.exist?(opts[:strain])
-      establish_connection(opts[:name])
-      strains = []
-        File.read(opts[:strain]).each_line do |line|
-          strains << line.chop
-        end
+  # The populate_features_and_annotations method populates the features and annotations.  It uses the embl/gbk file.
+  populate_features_and_annotations(sequence_flatfile)
-      # puts find_shared_snps(strains)
-      # exit
-      gas_snps = find_shared_snps(strains)
+  #The populate_snps_alleles_genotypes method populates the snps, alleles and genotypes.  It uses the vcf file, and if specified, the SNP quality cutoff and genotype quality cutoff
-      gas_snps.each do |snp|
-        puts "The number of unique snps are #{snp.id}"
-      end
+  populate_snps_alleles_genotypes(opts[:vcf_file], opts[:cuttoff_ad])
-################################################################
-  # REMOVE SNPS ASSOCIATED WITH SPECIFIC GENES
-  elsif opts[:not_include_snps_from_gene]
+###########################################################
-      error_msg = ""
+# QUERYING THE DATABASE
+elsif opts[:output]
-        error_msg += "-n: \t Name of your database\n" unless opts[:name]
-        error_msg += "-o: \t name of your output file\n" unless opts[:out]
-        error_msg += "-a: \t name of the gene that you like to remove from the database\n" unless opts[:annotation]
-        error_msg_optional = ""
+  error_msg = ""
+  error_msg += "-S: \t SNPs from specified features in the database OR\n-u: \t Query for unique snps in the database OR\n-i: \t Information on all SNPs\n" unless opts[:snps_from_feature] || opts[:unique_snps] || opts[:info]
-        error_msg_optional += "-tree: \t Construct tree from output\n" unless  opts[:tree]
-        error_msg_optional += "-nwk_out: Name of Newick output file(use only when-tree option used)\n"  unless opts[:nwk_out]
+  unless error_msg == ""
+    puts "Please provide the following required fields:"
+    puts error_msg
+    puts opts.help unless opts.empty?
+    exit
+  end
-        unless error_msg == ""
-          puts "Please provide the following required fields:"
-          puts error_msg
-          puts "Optional fields:"
-          puts error_msg_optional
-          puts opts.help unless opts.empty?
-          exit
-        end
+  if opts[:snps_from_feature]
-        abort "#{opts[:name]} database does not exist!" unless File.exist?(opts[:name])
-      # annotation = opts[:annotation]
-     establish_connection(opts[:name])
-      # Getting list of strains from database
-      strains = Strain.all
-      sequence_hash = Hash.new
-      # create a sequence hash
-      # hash key is strain_id, loop through strain_id
-      # create an empty array
-      strains.each do |strain|
-        sequence_hash[strain.id] = Array.new
-      end
+    error_msg = ""
-      # output opened for data input
-      output = File.open("#{opts[:out]}", "w")
-      # Perform query
-      snps = Snp.includes(:alleles => :genotypes).find_by_sql("SELECT snps.* FROM snps INNER JOIN features ON features.id = snps.feature_id WHERE features.id NOT IN (select distinct features.id FROM features INNER JOIN annotations ON annotations.feature_id = features.id WHERE annotations.value LIKE '%#{opts[:annotation]}%')")
-        i = 0
-        puts "Your Query is submitted and is being processed......."
-        snps.each do |snp|
-          # puts snp.inspect
-          i += 1
-          puts "Total number of SNPs generated so far: #{i}" if i % 100 == 0
-           ActiveRecord::Base.transaction do
-                snp.alleles.each do |allele|
-                  # puts allele.inspect
-                  allele.genotypes.each do |genotype|
-                    #push bases to hash
-                    sequence_hash[genotype.strain_id] << allele.base
-                  end
-                end
-            end
-        end
+    error_msg += "-n: \t Name of your database\n" unless opts[:name]
+    error_msg += "-o: \t name of your output file\n" unless opts[:out]
+    error_msg += "-F: \t Fasta output OR\n-T: \t Tabular output" unless opts[:fasta] || opts[:tabular]
+    error_msg_optional = ""
-    #generate FASTA file
-    strains.each do |strain|
-      output.print ">#{strain.name}\n" , sequence_hash[strain.id].join("")
-      output.puts
+    error_msg_optional += "-I,\t --ignore_snps_on_annotation: ignore SNPs from specified features in the database\n" unless opts[:ignore_snps_on_annotation]
+    error_msg_optional += "-R,\t --ignore_strains: A list of strains to ignore\n" unless  opts[:ignore_strains]
+    error_msg_optional += "-i,\t --ignore_snps_in_range: A list of position ranges to ignore e.g 10..500,2000..2500\n" unless  opts[:ignore_snps_in_range]
+    error_msg_optional += "-c,\t --cuttoff_snp_qual: cuttoff for SNP Quality\n"  unless  opts[:cuttoff_snp_qual]
+    error_msg_optional += "-g,\t --cuttoff_genotype: cuttoff for Genotype Quality\n"  unless  opts[:cuttoff_genotype]
+    error_msg_optional += "-r,\t --remove_non_informative_snps: Only output informative SNPs\n"  unless  opts[:remove_non_informative_snps]
+    error_msg_optional += "-t,\t --tree: Construct tree from output\n" unless  opts[:tree]
+    error_msg_optional += "-w,\t --nwk_out: Name of Newick output file(use only when-tree option used)\n"  unless opts[:nwk_out]
+    unless error_msg == ""
+      puts "Please provide the following required fields:"
+      puts error_msg
+      puts "Optional fields:"
+      puts error_msg_optional
+      # Added this here as it wont appear here in error_msg_optional as its set as default.
+      puts "-c,\t --cuttoff_snp_qual: cuttoff for SNP Quality (default 90)\n"
+      puts "-g,\t --cuttoff_genotype: cuttoff for Genotype Quality (default 30)\n"
+      puts opts.help unless opts.empty?
+      exit
     end
-    # GENERATE TREE FROM FASTA FILE
-    if opts[:tree]
-      `FastTree -fastest -nt #{opts[:out]} > #{opts[:nwk_out]}`
-    end
+    abort "#{opts[:name]} database does not exist!" unless File.exist?(opts[:name])
+    establish_connection(opts[:name])
-  else
-    puts "use -unique_snps or -not_include_snps_from_gene query options"
+    get_snps(opts[:out], opts[:ignore_snps_on_annotation], opts[:ignore_snps_in_range], opts[:ignore_strains], opts[:remove_non_informative_snps], opts[:fasta], opts[:tabular], opts[:cuttoff_genotype], opts[:cuttoff_snp_qual], opts[:tree], opts[:fasttree_path])
   end
-# ##############################################################
+####################################################################################################
+  #FIND UNIQUE SNPS
+  if opts[:unique_snps]
-# OUTPUT DATABASE IN FASTA FORMAT
-elsif opts[:output]
-  if opts[:fasta]
     error_msg = ""
-        error_msg += "-n: \t Name of your database\n" unless opts[:name]
-        error_msg += "-o: \t name of your output file (in FASTA format)\n" unless opts[:out]
-    error_msg_optional = ""
-        error_msg_optional += "-tree: \t Construct tree from output\n" unless  opts[:tree]
-        error_msg_optional += "-nwk_out: Name of Newick output file(use only when-tree option used)\n"  unless opts[:nwk_out]
-        unless error_msg == ""
-          puts "Please provide the following required fields:"
-          puts error_msg
-          puts "Optional fields:"
-          puts error_msg_optional
-          puts opts.help unless opts.empty?
-          exit
-        end
-        abort "#{opts[:name]} database does not exist!" unless File.exist?(opts[:name])
+    error_msg += "-n: \t Name of your database\n" unless opts[:name]
+    error_msg += "-s: \t List of strains you like to query\n" unless opts[:strain]
+    error_msg += "-o: \t Name of the output file\n" unless opts[:out]
+    unless error_msg == ""
+      puts "Please provide the following required fields:"
+      puts error_msg
+      puts "Optional fields:"
+      # Added this here as it wont appear here in error_msg_optional as its set as default.
+      puts "-c,\t --cuttoff_snp_qual: cuttoff for SNP Quality (default 90)\n"
+      puts "-g,\t --cuttoff_genotype: cuttoff for Genotype Quality (default 30)\n"
+      puts opts.help unless opts.empty?
+      exit
+    end
+    abort "#{opts[:name]} database does not exist!" unless File.exist?(opts[:name])
+    abort "#{opts[:strain]} file does not exist!" unless File.exist?(opts[:strain])
     establish_connection(opts[:name])
-      # Getting list of strains from database
-      strains = Strain.all
-      sequence_hash = Hash.new
-      # create a sequence hash
-      # hash key is strain_id, loop through strain_id
-      # create an empty array
-      strains.each do |strain|
-        sequence_hash[strain.id] = Array.new
-      end
-      output = File.open("#{opts[:out]}", "w")
-      # Select all snps
-      snps = Snp.all
-        i = 0
-        puts "Your out file is being prepared......."
-        snps.each do |snp|
-          i += 1
-          puts "Total number of SNPs outputted so far: #{i}" if i % 100 == 0
-         ActiveRecord::Base.transaction do
-              snp.alleles.each do |allele|
-                # puts allele.inspect
-                allele.genotypes.each do |genotype|
-                  #push bases to hash
-                  sequence_hash[genotype.strain_id] << allele.base
-                end
-              end
-          end
-        end
-    puts sequence_hash
-    exit
-    #generate FASTA file
-    strains.each do |strain|
-      output.print ">#{strain.name}\n" , sequence_hash[strain.id].join("")
-      output.puts
-    end
-    if opts[:tree]
-      # puts "FastTree -fastest -nt #{opts[:out]} > #{opts[:w]}"
-       `FastTree -fastest -nt #{opts[:out]} > #{opts[:w]}`
-    end
+    strains = []
+      File.read(opts[:strain]).each_line do |line|
+        strains << line.chop
+      end
+    # find_unique_snps defined in bin/snp-search.rb
+    find_unqiue_snps(strains, opts[:out], opts[:cuttoff_genotype], opts[:cuttoff_snp_qual])
   end
+##############################################################
+  if opts[:info]
-    #########################################
-  if opts[:syn]
     error_msg = ""
-        error_msg += "-n option: \t the name of your database\n" unless opts[:name]
-        error_msg += "-d option: \t the reference file in gbk format\n" unless opts[:database_reference_file]
-        unless error_msg == ""
-          puts "Please provide the following required fields:"
-          puts error_msg
-          puts opts.help unless opts.empty?
-          exit
-        end
+    error_msg += "-n: \t the name of your database\n" unless opts[:name]
+    error_msg += "-o: \t name of your output file (in tab-delimited format)\n" unless opts[:out]
+    unless error_msg == ""
+      puts "Please provide the following required fields:"
+      puts error_msg
+      puts "Optional fields:"
+      # Added this here as it wont appear here in error_msg_optional as its set as default.
+      puts "-c,\t --cuttoff_snp_qual: cuttoff for SNP Quality (default 90)\n"
+      puts "-g,\t --cuttoff_genotype: cuttoff for Genotype Quality (default 30)\n"
+      puts opts.help unless opts.empty?
+      exit
+    end
-        abort "#{opts[:name]} database does not exist!" unless File.exist?(opts[:name])
-        abort "#{opts[:database_reference_file]} vcf file does not exist!" unless File.exist?(opts[:database_reference_file])
+    abort "#{opts[:name]} database does not exist!" unless File.exist?(opts[:name])
     establish_connection(opts[:name])
-    ref = opts[:database_reference_file]
-    synonymous(ref)
-  end
+    #information defined in bin/snp-search.rb
+    information(opts[:out], opts[:cuttoff_genotype], opts[:cuttoff_snp_qual])
+  end
 else
-  puts opts.help
+   puts opts.help
 end