RubyGems - snp-search - Versions diffs - 0.34.0 → 1.0.0 - Mend

snp-search 0.34.0 → 1.0.0

Files changed (10) hide show

data/README.rdoc CHANGED Viewed

@@ -1,6 +1,6 @@
 = snp-search
-SNPsearch is a tool that manages SNP data and allows for data importing, manipulating, editing and complex querying of SNP data.  It can be used to evaluate the utility of SNPs for the assessment of genetic diversity between haploid strains and the management of genotype and phenotype data.  Once a query is performed, SNPsearch can be used to convert the selected SNP data into FASTA sequences.  SNPsearch is particularly useful in the analysis of phylogenetic trees that are based on SNP differences across whole core genomes.  Queries can be made to answer critical genomic questions such as the association of SNPs with particular phenotypes.
+SNPsearch is a tool that manages SNP data and allows for data importing, manipulating, editing and complex querying of SNP data.  It can be used to evaluate the utility of SNPs for the assessment of genetic diversity between haploid strains and the management of genotype and phenotype data.  Once the database is created, the user is provided with several query and output options. SNPsearch is particularly useful in the analysis of phylogenetic trees that are based on SNP differences across whole core genomes.  Queries can be made to answer critical genomic questions such as the association of SNPs with particular phenotypes.
 == Obtaining and installing the code
 SNPsearch is written in Ruby and operates in a Unix environment.  It is made available as a gem. See the github site for more information (https://github.com/hpa-bioinformatics/snp-search).
@@ -15,70 +15,79 @@ Not much, you just need:
 * Unix. Once snp-search is installed, all the necessary gems to run snp-search will also be installed from Rubygems (note that Rubygems requires admin privileges.  If you do not have admin privileges then we suggest you install RVM: (http://beginrescueend.com/rvm/install/) and then gem install snp-search).
 * ruby version 1.8.7 and above.
+* Optional: FastTree.  If you require a tree output in Newick format, you must install FastTree from http://www.microbesonline.org/fasttree/#Install.  You must specify the path of the executable in your .bashrc or .profile file as snp-search will run the command as just 'FastTree' and will not know where FastTree is if it is not specified in your .bashrc or .profile file.
 Thats it!
 == Running snp-search
-To run snp-search, you need two files:
+1- Creating the database (snp-search -create)
+  Two files are needed to create the SQLite3 database:
-1- Variant Call Format (.vcf) file (which contains the SNP information)
+  1- Variant Call Format (.vcf) file (which contains the SNP information)
-2- Your database reference genome that you used to generate your .vcf file (in genbank or embl format, the script will automatically detect the format).
+  2- Your database reference genome that you used to generate your .vcf file (in genbank or embl format, the script will automatically detect the format).
-Once you have these files ready, you may run snp-search with the following options:
+You need the following parameters:
-  -V	Enable verbose mode
   -n	Name of your database
-  -v	.vcf file	Required
-  -d	Database Reference genome (The same file that was used in generating the .vcf file).  This should be in genbank or embl format.	Required
+  -v	.vcf file
+  -d	Database Reference genome (The same file that was used in generating the .vcf file).  This should be in genbank or embl format.
+  Other options:
   -c	SNP quality score cutoff.  A Phred-scaled quality score. High quality scores indicate high confidence calls. Optional, default = 90 (out of 100)
-  -t	Genotype Quality score cutoff. Phred-scaled quality score that the genotype is true.	Optional, default = 30
+  -g	Genotype Quality score cutoff. Phred-scaled quality score that the genotype is true.	Optional, default = 30
   -h	help message
-Usage:
-  snp-search -n my_snp_db.sqlite3 -d my_ref.gbk -v my_vcf_file.vcf
+  Usage:
+    snp-search -create -n my_snp_db.sqlite3 -d my_ref.gbk -v my_vcf_file.vcf
-Note: The strain names in your database will be taken from your vcf file so make sure they are named appropriately in your vcf file.
+  Note: The strain names in your database will be taken from your vcf file so make sure they are named appropriately in your vcf file.
-== Output
-The output is your database in sqlite3 format.  If you like to view your table(s) and perform queries you can type
-  sqlite3 snp_db.sqlite3
+2- Querying the Database (snp-search -query)
-Alternatively, you may download a SQL tool to view your database (e.g. SQLite sorcerer).
+  Two queries are currently scripted in SNPsearch:
-Also, depending on the query, a concatenated SNP FASTA file may be outputed (see below).
+  1- genes_query: This option queries the database and selects the number of unique SNPs within the list of the strains/samples provided.  The output is the number of unique SNPs.
-== Examples
+  You need the following parameters:
-We have included two example queries that you may find useful:
+  -n  Name of your database
+  -s  The strains/samples you like to query
-* Example1: This script queries the database to select only those SNPs not found in phage related genes. These SNPs were used to make a concatenated SNP multiple alignment file (FASTA format).  This is a way of removing a set of genes that are not needed for the SNP analysis. You may use this script to do other SQL queries that result in a FASTA output.
+  Usage:
+    snp-search -n my_snp_db.sqlite3 -s list_of_my_strains.txt
-Usage:
+  2- remove_genes: This option queries the database to select only those SNPs not found in a specified gene. These SNPs are used to make a concatenated SNP multiple alignment file (FASTA format).  This is a way of removing a set of genes (likely to be mobile element genes) that are not needed for SNP analysis.  The user has the option of generating a core SNP tree Newick file for SNP phylogeny.
-  ruby example1.rb -D your_db_name.sqlite3 -s list_of_your_species.txt -o output.fasta
+  You need the following parameters:
-options:
+  -n  Name of your database
+  -a  The gene you like to remove from analysis
+  -o  Output file, in fasta format
-  -V,	Enable verbose mode
-  -D,	The name of the database you like to query, Required
-  -o,	output file, in fasta format
-  -s,	The strains/samples you like to query, Required
-  -a,	The gene you like to remove from analysis
-  -h,	Print this help message
+  options:
+  -t  Generate SNP phylogeny
+  -w  Output tree in Newick format
-* Example2: This script queries the database and selects the number of unique SNPs within the list of the strains/samples provided.  The output is the number of unique SNPs.
+  Usage (phage is used as the example gene):
+  snp-search -n my_snp_db.sqlite3 -a phage -o snps_sequences_without_phage.fasta -t -w snps_sequences_without_phage.nwk
-Usage:
+  The algorithm FastTree is used to generate the nwk file.  FastTree can be downloaded from http://www.microbesonline.org/fasttree/#Install (see above)
-  ruby example2.rb -D your_db_name.sqlite3 -s list_of_your_species.txt
+  3- Output database (snp-search -out_file)
-options:
+  You need the following parameters:
-  -V,	Enable verbose mode
-  -D,	The name of the database you like to query, Required
-  -s,	The strains/samples you like to query, Required
-  -h,	Print this help message
+  -n  Name of your database
+  -o  Output file containing the database in fasta format
+== View database in Unix or in a GUI
+Your database will be in sqlite3 format.  If you like to view your table(s) and perform direct queries you can type
+  sqlite3 snp_db.sqlite3
+Alternatively, you may download a SQL tool to view your database (e.g. SQLite sorcerer).
 == Contact

data/VERSION CHANGED Viewed

	@@ -1 +1 @@
1	- 0.~~34.~~0
1	+ 1.0.0

data/bin/snp-search CHANGED Viewed

@@ -2,67 +2,266 @@ require 'snp-search'
 require 'snp_db_connection'
 require 'snp_db_models'
 require 'snp_db_schema'
+require 'activerecord-import'
+# gem "slop", "~> 3.1.0"
 gem "slop", "~> 2.4.0"
 require 'slop'
-opts = Slop.new :help do
-  banner "ruby snp-search [OPTIONS]"
+opts = Slop.new do
+  # separator 'test'
-  on :V, :verbose, 'Enable verbose mode'
-  on :n, :name=, 'Name of database, Required', true
-  on :d, :database_reference_file=, 'Reference genome file, in gbk or embl file format, Required', true
-  on :v, :vcf_file=, '.vcf file, Required', true
-  on :c, :cuttoff_snp=, 'SNP quality cutoff, (default = 90)', :default => 90
-  on :t, :cuttoff_genotype=, 'Genotype quality cutoff (default = 30)', :default => 30
+  banner "\nruby snp-search [OPTIONS]"
+  on :C, :create, 'Create database'
+  on :Q, :query, 'Query database'
+  on :O, :out_file, 'Output the database to a file'
+  # separator ''
+  # separator 'README file: https://github.com/hpa-bioinformatics/snp-search/blob/master/README.rdoc'
+  # separator 'The following command must be used when using -create, or -query or -out_file'
+  on :n, :name=, 'Name of database, Required'
+  # separator ''
+  # separator '-create options'
+  on :d, :database_reference_file, 'Reference genome file, in gbk or embl file format, Required', true
+  on :v, :vcf_file, '.vcf file, Required', true
+  on :c, :cuttoff_snp, 'SNP quality cutoff, (default = 90)', :default => 90
+  on :g, :cuttoff_genotype, 'Genotype quality cutoff (default = 30)', :default => 30
+  # separator ''
+  # separator '-query options'
+  on :G, :genes_query, 'Query for unique genes in the database'
+  on :R, :remove_genes, 'Remove set of genes from database and create FASTA file'
+  on :s, :strain=, 'The strains/samples you like to query, Required'
+  on :a, :annotation=, 'The gene you like to remove from analysis'
+  on :o, :output=, 'output file, in fasta format'
+  on :t, :tree, 'Generate SNP phylogeny'
+  on :w, :tree_nwk_output=, 'output tree in Newick format'
+  on :S, :syn, 'syn'
 end
 opts.parse
-  error_msg = ""
+###########################################################
-  error_msg += "You must supply the -n option, it's a required field\n" unless opts[:name]
-  error_msg += "You must supply the -d option, it's a required field\n" unless opts[:database_reference_file]
-  error_msg += "You must supply the -v option, it's a required field" unless opts[:vcf_file]
+# CREATING A DATABASE
+if opts[:create]
-  unless error_msg == ""
-    puts error_msg
-    puts opts.help unless opts.empty?
-    exit
+      error_msg = ""
+      error_msg += "-n option: \t the name of your database\n" unless opts[:name]
+      error_msg += "-d option: \t reference genome file, in gbk or embl file format\n" unless opts[:database_reference_file]
+      error_msg += "-v option: \t .vcf file\n" unless opts[:vcf_file]
+      unless error_msg == ""
+        puts "Please provide the following required fields:"
+        puts error_msg
+        puts opts.help unless opts.empty?
+        exit
+      end
+      abort "#{opts[:database_reference_file]} file does not exist!" unless File.exist?(opts[:database_reference_file])
+      abort "#{opts[:vcf_file]} file does not exist!" unless File.exist?(opts[:vcf_file])
+    # Name of your database
+    establish_connection(opts[:name])
+    # Schema will run here
+    db_schema
+    ref = opts[:database_reference_file]
+    sequence_format = guess_sequence_format(ref)
+          case sequence_format
+          when :genbank
+            sequence_flatfile = Bio::FlatFile.open(Bio::GenBank,opts[:database_reference_file]).next_entry
+          when :embl
+            sequence_flatfile = Bio::FlatFile.open(Bio::EMBL,opts[:database_reference_file]).next_entry
+          else
+            puts "All sequence files should be in genbank or embl format"
+            exit
+          end
+      # path for vcf file here
+      vcf_mpileup_file = opts[:vcf_file]
+      # The populate_features_and_annotations method populates the features and annotations.  It uses the embl/gbk file.
+      populate_features_and_annotations(sequence_flatfile)
+      #The populate_snps_alleles_genotypes method populates the snps, alleles and genotypes.  It uses the vcf file, and if specified, the SNP quality cutoff and genotype quality cutoff
+      populate_snps_alleles_genotypes(vcf_mpileup_file, opts[:cuttoff_snp].to_i, opts[:cuttoff_genotype].to_i)
+###########################################################
+# QUERYING THE DATABASE
+elsif opts [:query]
+  #FIND UNIQUE SNPS
+  if opts[:genes_query]
+        error_msg = ""
+        error_msg += "-n option, \t the name of your database\n" unless opts[:name]
+        error_msg += "-s option, \t list of strains you like to query\n" unless opts[:strain]
+        unless error_msg == ""
+          puts "Please provide the following required fields:"
+          puts error_msg
+          puts opts.help unless opts.empty?
+          exit
+        end
+        abort "#{opts[:name]} database does not exist!" unless File.exist?(opts[:name])
+        abort "#{opts[:strain]} file does not exist!" unless File.exist?(opts[:strain])
+      establish_connection(opts[:name])
+      strains = []
+        File.read(opts[:strain]).each_line do |line|
+          strains << line.chop
+        end
+      # puts find_shared_snps(strains)
+      # exit
+      gas_snps = find_shared_snps(strains)
+      gas_snps.each do |snp|
+        puts "The number of unique snps are #{snp.id}.size"
+      end
+################################################################
+  # REMOVE SNPS ASSOCIATED WITH SPECIFIC GENES
+  elsif opts[:remove_genes]
+      error_msg = ""
+        error_msg += "-n option: \t the name of your database\n" unless opts[:name]
+        error_msg += "-o option: \t name of your output file\n" unless opts[:output]
+        error_msg += "-a option: \t name of the gene that you like to remove from the database\n" unless opts[:annotation]
+        unless error_msg == ""
+          puts "Please provide the following required fields:"
+          puts error_msg
+          puts opts.help unless opts.empty?
+          exit
+        end
+        abort "#{opts[:name]} database does not exist!" unless File.exist?(opts[:name])
+      # annotation = opts[:annotation]
+     establish_connection(opts[:name])
+      # Getting list of strains from database
+      strains = Strain.all
+      sequence_hash = Hash.new
+      # create a sequence hash
+      # hash key is strain_id, loop through strain_id
+      # create an empty array
+      strains.each do |strain|
+        sequence_hash[strain.id] = Array.new
+      end
+      # output opened for data input
+      output = File.open("#{opts[:output]}", "w")
+      # Perform query
+      snps = Snp.includes(:alleles => :genotypes).find_by_sql("SELECT snps.* FROM snps INNER JOIN features ON features.id = snps.feature_id WHERE features.id NOT IN (select distinct features.id FROM features INNER JOIN annotations ON annotations.feature_id = features.id WHERE annotations.value LIKE '%#{opts[:annotation]}%')")
+        i = 0
+        puts "Your Query is submitted and is being processed......."
+        snps.each do |snp|
+          # puts snp.inspect
+          i += 1
+          puts "Total number of SNPs generated so far: #{i}" if i % 100 == 0
+     ActiveRecord::Base.transaction do
+          snp.alleles.each do |allele|
+            # puts allele.inspect
+            allele.genotypes.each do |genotype|
+              #push bases to hash
+              sequence_hash[genotype.strain_id] << allele.base
+            end
+          end
+        end
+    end
+    #generate FASTA file
+    strains.each do |strain|
+      output.print ">#{strain.name}\n" , sequence_hash[strain.id].join("")
+      output.puts
+    end
+    if opts[:tree]
+      `FastTree -fastest -nt #{opts[:output]} > #{opts[:w]}`
+    end
   end
-  abort "#{opts[:database_reference_file]} file does not exist!" unless File.exist?(opts[:database_reference_file])
-  abort "#{opts[:vcf_file]} file does not exist!" unless File.exist?(opts[:vcf_file])
+##############################################################
-# Name of your database
-establish_connection(opts[:name])
+# OUTPUT DATABASE IN FASTA FORMAT
+elsif opts[:out_file]
+    error_msg = ""
-# Schema will run here
-db_schema
+        error_msg += "-n option: \t the name of your database\n" unless opts[:name]
+        error_msg += "-o option: \t name of your output file\n" unless opts[:output]
-ref = opts[:database_reference_file]
+        unless error_msg == ""
+          puts "Please provide the following required fields:"
+          puts error_msg
+          puts opts.help unless opts.empty?
+          exit
+        end
-sequence_format = guess_sequence_format(ref)
+        abort "#{opts[:name]} database does not exist!" unless File.exist?(opts[:name])
-      case sequence_format
-      when :genbank
-        sequence_flatfile = Bio::FlatFile.open(Bio::GenBank,opts[:database_reference_file]).next_entry
-      when :embl
-        sequence_flatfile = Bio::FlatFile.open(Bio::EMBL,opts[:database_reference_file]).next_entry
-      else
-        puts "All sequence files should be of genbank or embl format"
-        exit
+  establish_connection(opts[:name])
+      # Getting list of strains from database
+      strains = Strain.all
+      sequence_hash = Hash.new
+      # create a sequence hash
+      # hash key is strain_id, loop through strain_id
+      # create an empty array
+      strains.each do |strain|
+        sequence_hash[strain.id] = Array.new
       end
-# path for vcf file here
-vcf_mpileup_file = opts[:vcf_file]
+      output = File.open("#{opts[:output]}", "w")
+      # Select all snps
+      snps = Snp.all
+        i = 0
+        puts "Your out file is being prepared......."
+        snps.each do |snp|
+          i += 1
+          puts "Total number of SNPs outputted so far: #{i}" if i % 100 == 0
-# The populate_features_and_annotations method populates the features and annotations.  It uses the embl/gbk file.
-populate_features_and_annotations(sequence_flatfile)
+     ActiveRecord::Base.transaction do
+          snp.alleles.each do |allele|
+            # puts allele.inspect
+            allele.genotypes.each do |genotype|
+              #push bases to hash
+              sequence_hash[genotype.strain_id] << allele.base
+            end
+          end
+        end
-#The populate_snps_alleles_genotypes method populates the snps, alleles and genotypes.  It uses the vcf file, and if specified, the SNP quality cutoff and genotype quality cutoff
-populate_snps_alleles_genotypes(vcf_mpileup_file, opts[:cuttoff_snp].to_i, opts[:cuttoff_genotype].to_i)
+    #generate FASTA file
+    strains.each do |strain|
+      output.print ">#{strain.name}\n" , sequence_hash[strain.id].join("")
+      output.puts
+    end
+    if opts[:tree]
+      `FastTree -fastest -nt #{opts[:output]} > #{opts[:w]}`
+    end
+  end
+else
+   puts opts.help
+end

data/lib/snp-search.rb CHANGED Viewed

@@ -166,3 +166,14 @@ puts "Adding SNPs........"
 		snp.save
 	end
 end
+def find_shared_snps(strain_names)
+    *strain_names = strain_names
+   where_statement = strain_names.collect{|strain_name| "strains.name = '#{strain_name}' OR "}.join("").sub(/ OR $/, "")
+   puts "Snp.find_by_sql(\"SELECT * from snps INNER JOIN alleles ON alleles.snp_id = snps.id INNER JOIN genotypes ON alleles.id = genotypes.allele_id INNER JOIN strains ON strains.id = genotypes.strain_id WHERE (#{where_statement}) AND alleles.id <> snps.reference_allele_id AND (SELECT COUNT(*) from snps AS s INNER JOIN alleles ON alleles.snp_id = snps.id INNER JOIN genotypes ON alleles.id = genotypes.allele_id WHERE alleles.id <> snps.reference_allele_id and s.id = snps.id) = #{strain_names.size} GROUP BY snps.id HAVING COUNT(*) = #{strain_names.size}\")"
+end

data/lib/snp_db_connection.rb CHANGED Viewed

@@ -6,7 +6,5 @@ def establish_connection(db_location)
   ActiveRecord::Base.establish_connection(
     :adapter => "sqlite3",
     :database => db_location,
-    :pool => 5,
-    :timeout => 5000
   )
 end

data/snp-search.gemspec CHANGED Viewed

@@ -5,11 +5,11 @@
 Gem::Specification.new do |s|
   s.name = "snp-search"
-  s.version = "0.34.0"
+  s.version = "1.0.0"
   s.required_rubygems_version = Gem::Requirement.new(">= 0") if s.respond_to? :required_rubygems_version=
   s.authors = ["Ali Al-Shahib", "Anthony Underwood"]
-  s.date = "2012-01-11"
+  s.date = "2012-05-10"
   s.description = "Use the snp-search tool to create, import, manipulate and query your SNP database"
   s.email = "ali.al-shahib@hpa.org.uk"
   s.executables = ["snp-search"]
@@ -28,9 +28,6 @@ Gem::Specification.new do |s|
     "Rakefile",
     "VERSION",
     "bin/snp-search",
-    "examples/example1.rb",
-    "examples/example2.rb",
-    "examples/snp_db_models.rb",
     "lib/snp-search.rb",
     "lib/snp_db_connection.rb",
     "lib/snp_db_models.rb",

metadata CHANGED Viewed

@@ -1,7 +1,7 @@
 --- !ruby/object:Gem::Specification
 name: snp-search
 version: !ruby/object:Gem::Version
-  version: 0.34.0
+  version: 1.0.0
   prerelease:
 platform: ruby
 authors:
@@ -10,11 +10,11 @@ authors:
 autorequire:
 bindir: bin
 cert_chain: []
-date: 2012-01-11 00:00:00.000000000Z
+date: 2012-05-10 00:00:00.000000000Z
 dependencies:
 - !ruby/object:Gem::Dependency
   name: activerecord
-  requirement: &2166762620 !ruby/object:Gem::Requirement
+  requirement: &2165230340 !ruby/object:Gem::Requirement
     none: false
     requirements:
     - - ~>
@@ -22,10 +22,10 @@ dependencies:
         version: 3.1.3
   type: :runtime
   prerelease: false
-  version_requirements: *2166762620
+  version_requirements: *2165230340
 - !ruby/object:Gem::Dependency
   name: bio
-  requirement: &2166762140 !ruby/object:Gem::Requirement
+  requirement: &2165229420 !ruby/object:Gem::Requirement
     none: false
     requirements:
     - - ~>
@@ -33,10 +33,10 @@ dependencies:
         version: 1.4.2
   type: :runtime
   prerelease: false
-  version_requirements: *2166762140
+  version_requirements: *2165229420
 - !ruby/object:Gem::Dependency
   name: slop
-  requirement: &2166761620 !ruby/object:Gem::Requirement
+  requirement: &2165228320 !ruby/object:Gem::Requirement
     none: false
     requirements:
     - - ~>
@@ -44,10 +44,10 @@ dependencies:
         version: 2.4.0
   type: :runtime
   prerelease: false
-  version_requirements: *2166761620
+  version_requirements: *2165228320
 - !ruby/object:Gem::Dependency
   name: sqlite3
-  requirement: &2166761060 !ruby/object:Gem::Requirement
+  requirement: &2165227400 !ruby/object:Gem::Requirement
     none: false
     requirements:
     - - ~>
@@ -55,10 +55,10 @@ dependencies:
         version: 1.3.4
   type: :runtime
   prerelease: false
-  version_requirements: *2166761060
+  version_requirements: *2165227400
 - !ruby/object:Gem::Dependency
   name: activerecord-import
-  requirement: &2166760580 !ruby/object:Gem::Requirement
+  requirement: &2165226380 !ruby/object:Gem::Requirement
     none: false
     requirements:
     - - ~>
@@ -66,10 +66,10 @@ dependencies:
         version: 0.2.8
   type: :runtime
   prerelease: false
-  version_requirements: *2166760580
+  version_requirements: *2165226380
 - !ruby/object:Gem::Dependency
   name: rspec
-  requirement: &2166760100 !ruby/object:Gem::Requirement
+  requirement: &2165225400 !ruby/object:Gem::Requirement
     none: false
     requirements:
     - - ~>
@@ -77,10 +77,10 @@ dependencies:
         version: 2.3.0
   type: :development
   prerelease: false
-  version_requirements: *2166760100
+  version_requirements: *2165225400
 - !ruby/object:Gem::Dependency
   name: bundler
-  requirement: &2166759620 !ruby/object:Gem::Requirement
+  requirement: &2165224600 !ruby/object:Gem::Requirement
     none: false
     requirements:
     - - ~>
@@ -88,10 +88,10 @@ dependencies:
         version: 1.0.0
   type: :development
   prerelease: false
-  version_requirements: *2166759620
+  version_requirements: *2165224600
 - !ruby/object:Gem::Dependency
   name: jeweler
-  requirement: &2166759120 !ruby/object:Gem::Requirement
+  requirement: &2165223220 !ruby/object:Gem::Requirement
     none: false
     requirements:
     - - ~>
@@ -99,10 +99,10 @@ dependencies:
         version: 1.6.4
   type: :development
   prerelease: false
-  version_requirements: *2166759120
+  version_requirements: *2165223220
 - !ruby/object:Gem::Dependency
   name: rcov
-  requirement: &2166758600 !ruby/object:Gem::Requirement
+  requirement: &2165222000 !ruby/object:Gem::Requirement
     none: false
     requirements:
     - - ! '>='
@@ -110,7 +110,7 @@ dependencies:
         version: '0'
   type: :development
   prerelease: false
-  version_requirements: *2166758600
+  version_requirements: *2165222000
 description: Use the snp-search tool to create, import, manipulate and query your
   SNP database
 email: ali.al-shahib@hpa.org.uk
@@ -131,9 +131,6 @@ files:
 - Rakefile
 - VERSION
 - bin/snp-search
-- examples/example1.rb
-- examples/example2.rb
-- examples/snp_db_models.rb
 - lib/snp-search.rb
 - lib/snp_db_connection.rb
 - lib/snp_db_models.rb
@@ -156,7 +153,7 @@ required_ruby_version: !ruby/object:Gem::Requirement
       version: '0'
       segments:
       - 0
-      hash: -1735176367152600706
+      hash: 1630410471760364863
 required_rubygems_version: !ruby/object:Gem::Requirement
   none: false
   requirements:

data/examples/example1.rb DELETED Viewed

@@ -1,92 +0,0 @@
-# This query script removes the 'phage' genes from the database.
-# Only use this script once your database has been fully populated.
-# Usage: ruby example1.rb -d your_db_name.sqlite3 -s list_of_your_species.txt -o output.fasta
-# You may use this script to do other SQL queries that result in a fasta output.  Just change the 'snps' SQL query below with your query.
-require 'snp_db_models'
-gem "slop", "~> 2.4.0"
-require 'slop'
-opts = Slop.new :help do
-  banner "ruby query.rb [OPTIONS]"
-  on :V, :verbose, 'Enable verbose mode'
-  on :D, :database=, 'The name of the database you like to query', true
-  on :o, :outfile=, 'output file, in fasta format', true
-  on :s, :strain=, 'The strains/samples you like to query', true
-  on :a, :annotation=, 'The gene you like to remove from analysis', true
-  on_empty do
-    puts help
-  end
-end
-opts.parse
-  puts "You must supply the -s option, it's a required field" and exit unless opts[:strain]
-  puts "You must supply the -D option, it's a required field" and exit unless opts[:database]
-begin
-	puts "#{opts[:database]} file does not exist!" and exit unless File.exist?(opts[:database])
-rescue
-end
-begin
-	puts "#{opts[:strain]} file does not exist!" and exit unless File.exist?(opts[:strain])
-rescue
-end
-annotation = opts[:annotation]
-establish_connection(opts[:database])
-begin
-strains = []
-  File.read(opts[:strain]).each_line do |line|
-    strains << line.chop
-  end
-# Enter the name of your database
-outfile = File.open(opts[:outfile], "w")
-# create a sequence hash
-sequence_hash = Hash.new
-# create an array of strains
-# hash key is strain_name, loop through strain_names
-# create an empty array
-strains.each do |strain_name|
-	sequence_hash[strain_name] = Array.new
-end
-snps = Snp.find_by_sql("SELECT snps.* FROM snps
-          INNER JOIN features
-          ON features.id = snps.feature_id
-          WHERE features.id IN
-            (select features.id from features
-            WHERE id NOT IN
-              (select distinct features.id FROM features
-              INNER JOIN annotations ON
-              annotations.feature_id = features.id
-              WHERE annotations.value LIKE '%(#{annotation})%'))")
-#puts snps.size
-puts "Your Query is submitted and is being processed......."
-snps.each do |snp|
-	#break if i == 100
-	snp.alleles.each do |allele|
-		allele.genotypes.each do |genotype|
-			#	puts genotype.inspect
-			sequence_hash[genotype.strain.name] << allele.base
-		end
-	end
-end
-strains.each do |sn|
-	outfile.print ">#{sn}\n" , sequence_hash[sn].join("")
-	outfile.puts
-end
-rescue
-end

data/examples/example2.rb DELETED Viewed

@@ -1,61 +0,0 @@
-# This query script finds the unique snps amongs the list of strains provided.
-# Only use this script once your database has been fully populated.
-# Usage: ruby example2.rb -d your_db_name.sqlite3 -s list_of_your_species.txt
-# Output is the number of unique snps in the list of your strains provided in the -s option.
-# You may use this script to do other SQL queries.  Just change the SQL query below with your query.
-require  'snp_db_models'
-gem "slop", "~> 2.4.0"
-require 'slop'
-opts = Slop.new :help do
-  banner "ruby query.rb [OPTIONS]"
-  on :V, :verbose, 'Enable verbose mode'
-  on :D, :database=, 'The name of the database you like to query', true
-  on :s, :strain=, 'The strains/samples you like to query', true
-  on_empty do
-    puts help
-  end
-end
-opts.parse
-  puts "You must supply the -D option, it's a required field" and exit unless opts[:database]
-  puts "You must supply the -s option, it's a required field" and exit unless opts[:strain]
-begin
-	puts "#{opts[:database]} file does not exist!" and exit unless File.exist?(opts[:database])
-rescue
-end
-begin
-	puts "#{opts[:strain]} file does not exist!" and exit unless File.exist?(opts[:strain])
-rescue
-end
-establish_connection(opts[:database])
-begin
-strains = []
-  File.read(opts[:strain]).each_line do |line|
-    strains << line.chop
-  end
-def find_shared_snps(strain_names)
-  *strain_names = strain_names
- where_statement = strain_names.collect{|strain_name| "strains.name = '#{strain_name}' OR "}.join("").sub(/ OR $/, "")
- return Snp.find_by_sql("SELECT * FROM (SELECT features.* from features INNER JOIN snps ON features.id = snps.feature_id INNER JOIN alleles ON alleles.snp_id = snps.id INNER JOIN genotypes ON alleles.id = genotypes.allele_id INNER JOIN strains ON strains.id = genotypes.strain_id WHERE (#{where_statement}) AND alleles.id <> snps.reference_allele_id AND (SELECT COUNT(*) from snps AS s INNER JOIN alleles ON alleles.snp_id = snps.id INNER JOIN genotypes ON alleles.id = genotypes.allele_id WHERE alleles.id <> snps.reference_allele_id and s.id = snps.id) = #{strain_names.size} GROUP BY snps.id HAVING COUNT(*) = #{strain_names.size})");
-end
-gas_snps = find_shared_snps(strains)
-gas_snps.each do |snp|
-	puts "The number of unique snps are #{snp.id}"
-end
-rescue
-end

data/examples/snp_db_models.rb DELETED Viewed

@@ -1,32 +0,0 @@
-require 'snp_db_connection'
-class Strain < ActiveRecord::Base
-  has_many :alleles, :through => :genotypes
-  has_many :genotypes
-end
-class Feature < ActiveRecord::Base
-  has_many :annotations
-  has_many :snps
-end
-class Snp < ActiveRecord::Base
-  belongs_to :feature
-  has_many  :alleles
-  belongs_to :reference_allele, :class_name => "Allele", :foreign_key => "reference_allele_id"
-end
-class Allele < ActiveRecord::Base
-  has_many :genotypes
-  belongs_to :snp
-  has_many :strains, :through => :genotypes
-end
-class Genotype < ActiveRecord::Base
-  belongs_to :allele
-  belongs_to :strain
-end
-class Annotation < ActiveRecord::Base
-  belongs_to :feature
-end