snp-search 0.34.0 → 1.0.0

Sign up to get free protection for your applications and to get access to all the features.
data/README.rdoc CHANGED
@@ -1,6 +1,6 @@
1
1
  = snp-search
2
2
 
3
- SNPsearch is a tool that manages SNP data and allows for data importing, manipulating, editing and complex querying of SNP data. It can be used to evaluate the utility of SNPs for the assessment of genetic diversity between haploid strains and the management of genotype and phenotype data. Once a query is performed, SNPsearch can be used to convert the selected SNP data into FASTA sequences. SNPsearch is particularly useful in the analysis of phylogenetic trees that are based on SNP differences across whole core genomes. Queries can be made to answer critical genomic questions such as the association of SNPs with particular phenotypes.
3
+ SNPsearch is a tool that manages SNP data and allows for data importing, manipulating, editing and complex querying of SNP data. It can be used to evaluate the utility of SNPs for the assessment of genetic diversity between haploid strains and the management of genotype and phenotype data. Once the database is created, the user is provided with several query and output options. SNPsearch is particularly useful in the analysis of phylogenetic trees that are based on SNP differences across whole core genomes. Queries can be made to answer critical genomic questions such as the association of SNPs with particular phenotypes.
4
4
 
5
5
  == Obtaining and installing the code
6
6
  SNPsearch is written in Ruby and operates in a Unix environment. It is made available as a gem. See the github site for more information (https://github.com/hpa-bioinformatics/snp-search).
@@ -15,70 +15,79 @@ Not much, you just need:
15
15
  * Unix. Once snp-search is installed, all the necessary gems to run snp-search will also be installed from Rubygems (note that Rubygems requires admin privileges. If you do not have admin privileges then we suggest you install RVM: (http://beginrescueend.com/rvm/install/) and then gem install snp-search).
16
16
  * ruby version 1.8.7 and above.
17
17
 
18
+ * Optional: FastTree. If you require a tree output in Newick format, you must install FastTree from http://www.microbesonline.org/fasttree/#Install. You must specify the path of the executable in your .bashrc or .profile file as snp-search will run the command as just 'FastTree' and will not know where FastTree is if it is not specified in your .bashrc or .profile file.
19
+
18
20
  Thats it!
19
21
 
20
22
  == Running snp-search
21
23
 
22
- To run snp-search, you need two files:
24
+ 1- Creating the database (snp-search -create)
25
+
26
+ Two files are needed to create the SQLite3 database:
23
27
 
24
- 1- Variant Call Format (.vcf) file (which contains the SNP information)
28
+ 1- Variant Call Format (.vcf) file (which contains the SNP information)
25
29
 
26
- 2- Your database reference genome that you used to generate your .vcf file (in genbank or embl format, the script will automatically detect the format).
30
+ 2- Your database reference genome that you used to generate your .vcf file (in genbank or embl format, the script will automatically detect the format).
27
31
 
28
- Once you have these files ready, you may run snp-search with the following options:
32
+ You need the following parameters:
29
33
 
30
- -V Enable verbose mode
31
34
  -n Name of your database
32
- -v .vcf file Required
33
- -d Database Reference genome (The same file that was used in generating the .vcf file). This should be in genbank or embl format. Required
35
+ -v .vcf file
36
+ -d Database Reference genome (The same file that was used in generating the .vcf file). This should be in genbank or embl format.
37
+
38
+ Other options:
34
39
  -c SNP quality score cutoff. A Phred-scaled quality score. High quality scores indicate high confidence calls. Optional, default = 90 (out of 100)
35
- -t Genotype Quality score cutoff. Phred-scaled quality score that the genotype is true. Optional, default = 30
40
+ -g Genotype Quality score cutoff. Phred-scaled quality score that the genotype is true. Optional, default = 30
36
41
  -h help message
37
42
 
38
- Usage:
39
- snp-search -n my_snp_db.sqlite3 -d my_ref.gbk -v my_vcf_file.vcf
43
+ Usage:
44
+ snp-search -create -n my_snp_db.sqlite3 -d my_ref.gbk -v my_vcf_file.vcf
40
45
 
41
- Note: The strain names in your database will be taken from your vcf file so make sure they are named appropriately in your vcf file.
46
+ Note: The strain names in your database will be taken from your vcf file so make sure they are named appropriately in your vcf file.
42
47
 
43
- == Output
44
- The output is your database in sqlite3 format. If you like to view your table(s) and perform queries you can type
45
- sqlite3 snp_db.sqlite3
48
+ 2- Querying the Database (snp-search -query)
46
49
 
47
- Alternatively, you may download a SQL tool to view your database (e.g. SQLite sorcerer).
50
+ Two queries are currently scripted in SNPsearch:
48
51
 
49
- Also, depending on the query, a concatenated SNP FASTA file may be outputed (see below).
52
+ 1- genes_query: This option queries the database and selects the number of unique SNPs within the list of the strains/samples provided. The output is the number of unique SNPs.
50
53
 
51
- == Examples
54
+ You need the following parameters:
52
55
 
53
- We have included two example queries that you may find useful:
56
+ -n Name of your database
57
+ -s The strains/samples you like to query
54
58
 
55
- * Example1: This script queries the database to select only those SNPs not found in phage related genes. These SNPs were used to make a concatenated SNP multiple alignment file (FASTA format). This is a way of removing a set of genes that are not needed for the SNP analysis. You may use this script to do other SQL queries that result in a FASTA output.
59
+ Usage:
60
+ snp-search -n my_snp_db.sqlite3 -s list_of_my_strains.txt
56
61
 
57
- Usage:
62
+ 2- remove_genes: This option queries the database to select only those SNPs not found in a specified gene. These SNPs are used to make a concatenated SNP multiple alignment file (FASTA format). This is a way of removing a set of genes (likely to be mobile element genes) that are not needed for SNP analysis. The user has the option of generating a core SNP tree Newick file for SNP phylogeny.
58
63
 
59
- ruby example1.rb -D your_db_name.sqlite3 -s list_of_your_species.txt -o output.fasta
64
+ You need the following parameters:
60
65
 
61
- options:
66
+ -n Name of your database
67
+ -a The gene you like to remove from analysis
68
+ -o Output file, in fasta format
62
69
 
63
- -V, Enable verbose mode
64
- -D, The name of the database you like to query, Required
65
- -o, output file, in fasta format
66
- -s, The strains/samples you like to query, Required
67
- -a, The gene you like to remove from analysis
68
- -h, Print this help message
70
+ options:
71
+ -t Generate SNP phylogeny
72
+ -w Output tree in Newick format
69
73
 
70
- * Example2: This script queries the database and selects the number of unique SNPs within the list of the strains/samples provided. The output is the number of unique SNPs.
74
+ Usage (phage is used as the example gene):
75
+ snp-search -n my_snp_db.sqlite3 -a phage -o snps_sequences_without_phage.fasta -t -w snps_sequences_without_phage.nwk
71
76
 
72
- Usage:
77
+ The algorithm FastTree is used to generate the nwk file. FastTree can be downloaded from http://www.microbesonline.org/fasttree/#Install (see above)
73
78
 
74
- ruby example2.rb -D your_db_name.sqlite3 -s list_of_your_species.txt
79
+ 3- Output database (snp-search -out_file)
75
80
 
76
- options:
81
+ You need the following parameters:
77
82
 
78
- -V, Enable verbose mode
79
- -D, The name of the database you like to query, Required
80
- -s, The strains/samples you like to query, Required
81
- -h, Print this help message
83
+ -n Name of your database
84
+ -o Output file containing the database in fasta format
85
+
86
+ == View database in Unix or in a GUI
87
+ Your database will be in sqlite3 format. If you like to view your table(s) and perform direct queries you can type
88
+ sqlite3 snp_db.sqlite3
89
+
90
+ Alternatively, you may download a SQL tool to view your database (e.g. SQLite sorcerer).
82
91
 
83
92
  == Contact
84
93
 
data/VERSION CHANGED
@@ -1 +1 @@
1
- 0.34.0
1
+ 1.0.0
data/bin/snp-search CHANGED
@@ -2,67 +2,266 @@ require 'snp-search'
2
2
  require 'snp_db_connection'
3
3
  require 'snp_db_models'
4
4
  require 'snp_db_schema'
5
-
5
+ require 'activerecord-import'
6
+ # gem "slop", "~> 3.1.0"
6
7
  gem "slop", "~> 2.4.0"
7
8
  require 'slop'
8
9
 
9
- opts = Slop.new :help do
10
- banner "ruby snp-search [OPTIONS]"
10
+ opts = Slop.new do
11
+
12
+ # separator 'test'
11
13
 
12
- on :V, :verbose, 'Enable verbose mode'
13
- on :n, :name=, 'Name of database, Required', true
14
- on :d, :database_reference_file=, 'Reference genome file, in gbk or embl file format, Required', true
15
- on :v, :vcf_file=, '.vcf file, Required', true
16
- on :c, :cuttoff_snp=, 'SNP quality cutoff, (default = 90)', :default => 90
17
- on :t, :cuttoff_genotype=, 'Genotype quality cutoff (default = 30)', :default => 30
18
-
14
+ banner "\nruby snp-search [OPTIONS]"
15
+ on :C, :create, 'Create database'
16
+ on :Q, :query, 'Query database'
17
+ on :O, :out_file, 'Output the database to a file'
18
+ # separator ''
19
+ # separator 'README file: https://github.com/hpa-bioinformatics/snp-search/blob/master/README.rdoc'
20
+ # separator 'The following command must be used when using -create, or -query or -out_file'
21
+ on :n, :name=, 'Name of database, Required'
22
+ # separator ''
23
+ # separator '-create options'
24
+ on :d, :database_reference_file, 'Reference genome file, in gbk or embl file format, Required', true
25
+ on :v, :vcf_file, '.vcf file, Required', true
26
+ on :c, :cuttoff_snp, 'SNP quality cutoff, (default = 90)', :default => 90
27
+ on :g, :cuttoff_genotype, 'Genotype quality cutoff (default = 30)', :default => 30
28
+ # separator ''
29
+ # separator '-query options'
30
+ on :G, :genes_query, 'Query for unique genes in the database'
31
+ on :R, :remove_genes, 'Remove set of genes from database and create FASTA file'
32
+ on :s, :strain=, 'The strains/samples you like to query, Required'
33
+ on :a, :annotation=, 'The gene you like to remove from analysis'
34
+ on :o, :output=, 'output file, in fasta format'
35
+ on :t, :tree, 'Generate SNP phylogeny'
36
+ on :w, :tree_nwk_output=, 'output tree in Newick format'
37
+ on :S, :syn, 'syn'
19
38
  end
20
39
  opts.parse
21
40
 
22
- error_msg = ""
41
+ ###########################################################
23
42
 
24
- error_msg += "You must supply the -n option, it's a required field\n" unless opts[:name]
25
- error_msg += "You must supply the -d option, it's a required field\n" unless opts[:database_reference_file]
26
- error_msg += "You must supply the -v option, it's a required field" unless opts[:vcf_file]
43
+ # CREATING A DATABASE
44
+ if opts[:create]
27
45
 
28
- unless error_msg == ""
29
- puts error_msg
30
- puts opts.help unless opts.empty?
31
- exit
46
+
47
+ error_msg = ""
48
+
49
+ error_msg += "-n option: \t the name of your database\n" unless opts[:name]
50
+ error_msg += "-d option: \t reference genome file, in gbk or embl file format\n" unless opts[:database_reference_file]
51
+ error_msg += "-v option: \t .vcf file\n" unless opts[:vcf_file]
52
+
53
+ unless error_msg == ""
54
+ puts "Please provide the following required fields:"
55
+ puts error_msg
56
+ puts opts.help unless opts.empty?
57
+ exit
58
+ end
59
+
60
+
61
+ abort "#{opts[:database_reference_file]} file does not exist!" unless File.exist?(opts[:database_reference_file])
62
+
63
+ abort "#{opts[:vcf_file]} file does not exist!" unless File.exist?(opts[:vcf_file])
64
+
65
+
66
+ # Name of your database
67
+ establish_connection(opts[:name])
68
+
69
+ # Schema will run here
70
+ db_schema
71
+
72
+ ref = opts[:database_reference_file]
73
+
74
+ sequence_format = guess_sequence_format(ref)
75
+
76
+ case sequence_format
77
+ when :genbank
78
+ sequence_flatfile = Bio::FlatFile.open(Bio::GenBank,opts[:database_reference_file]).next_entry
79
+ when :embl
80
+ sequence_flatfile = Bio::FlatFile.open(Bio::EMBL,opts[:database_reference_file]).next_entry
81
+ else
82
+ puts "All sequence files should be in genbank or embl format"
83
+ exit
84
+ end
85
+
86
+ # path for vcf file here
87
+ vcf_mpileup_file = opts[:vcf_file]
88
+
89
+ # The populate_features_and_annotations method populates the features and annotations. It uses the embl/gbk file.
90
+ populate_features_and_annotations(sequence_flatfile)
91
+
92
+ #The populate_snps_alleles_genotypes method populates the snps, alleles and genotypes. It uses the vcf file, and if specified, the SNP quality cutoff and genotype quality cutoff
93
+ populate_snps_alleles_genotypes(vcf_mpileup_file, opts[:cuttoff_snp].to_i, opts[:cuttoff_genotype].to_i)
94
+
95
+ ###########################################################
96
+
97
+ # QUERYING THE DATABASE
98
+ elsif opts [:query]
99
+ #FIND UNIQUE SNPS
100
+ if opts[:genes_query]
101
+
102
+ error_msg = ""
103
+
104
+ error_msg += "-n option, \t the name of your database\n" unless opts[:name]
105
+ error_msg += "-s option, \t list of strains you like to query\n" unless opts[:strain]
106
+
107
+ unless error_msg == ""
108
+ puts "Please provide the following required fields:"
109
+ puts error_msg
110
+ puts opts.help unless opts.empty?
111
+ exit
112
+ end
113
+
114
+ abort "#{opts[:name]} database does not exist!" unless File.exist?(opts[:name])
115
+ abort "#{opts[:strain]} file does not exist!" unless File.exist?(opts[:strain])
116
+
117
+ establish_connection(opts[:name])
118
+
119
+ strains = []
120
+ File.read(opts[:strain]).each_line do |line|
121
+ strains << line.chop
122
+ end
123
+
124
+ # puts find_shared_snps(strains)
125
+ # exit
126
+ gas_snps = find_shared_snps(strains)
127
+
128
+ gas_snps.each do |snp|
129
+ puts "The number of unique snps are #{snp.id}.size"
130
+ end
131
+
132
+ ################################################################
133
+ # REMOVE SNPS ASSOCIATED WITH SPECIFIC GENES
134
+ elsif opts[:remove_genes]
135
+
136
+ error_msg = ""
137
+
138
+ error_msg += "-n option: \t the name of your database\n" unless opts[:name]
139
+ error_msg += "-o option: \t name of your output file\n" unless opts[:output]
140
+ error_msg += "-a option: \t name of the gene that you like to remove from the database\n" unless opts[:annotation]
141
+
142
+ unless error_msg == ""
143
+ puts "Please provide the following required fields:"
144
+ puts error_msg
145
+ puts opts.help unless opts.empty?
146
+ exit
147
+ end
148
+
149
+ abort "#{opts[:name]} database does not exist!" unless File.exist?(opts[:name])
150
+
151
+ # annotation = opts[:annotation]
152
+ establish_connection(opts[:name])
153
+
154
+
155
+ # Getting list of strains from database
156
+ strains = Strain.all
157
+
158
+ sequence_hash = Hash.new
159
+ # create a sequence hash
160
+ # hash key is strain_id, loop through strain_id
161
+ # create an empty array
162
+ strains.each do |strain|
163
+ sequence_hash[strain.id] = Array.new
164
+ end
165
+
166
+ # output opened for data input
167
+ output = File.open("#{opts[:output]}", "w")
168
+
169
+ # Perform query
170
+ snps = Snp.includes(:alleles => :genotypes).find_by_sql("SELECT snps.* FROM snps INNER JOIN features ON features.id = snps.feature_id WHERE features.id NOT IN (select distinct features.id FROM features INNER JOIN annotations ON annotations.feature_id = features.id WHERE annotations.value LIKE '%#{opts[:annotation]}%')")
171
+
172
+ i = 0
173
+ puts "Your Query is submitted and is being processed......."
174
+ snps.each do |snp|
175
+ # puts snp.inspect
176
+ i += 1
177
+ puts "Total number of SNPs generated so far: #{i}" if i % 100 == 0
178
+ ActiveRecord::Base.transaction do
179
+ snp.alleles.each do |allele|
180
+ # puts allele.inspect
181
+ allele.genotypes.each do |genotype|
182
+ #push bases to hash
183
+ sequence_hash[genotype.strain_id] << allele.base
184
+ end
185
+ end
186
+ end
187
+ end
188
+
189
+ #generate FASTA file
190
+ strains.each do |strain|
191
+ output.print ">#{strain.name}\n" , sequence_hash[strain.id].join("")
192
+ output.puts
193
+ end
194
+
195
+ if opts[:tree]
196
+ `FastTree -fastest -nt #{opts[:output]} > #{opts[:w]}`
197
+ end
32
198
  end
33
-
34
- abort "#{opts[:database_reference_file]} file does not exist!" unless File.exist?(opts[:database_reference_file])
35
-
36
- abort "#{opts[:vcf_file]} file does not exist!" unless File.exist?(opts[:vcf_file])
37
199
 
200
+ ##############################################################
38
201
 
39
- # Name of your database
40
- establish_connection(opts[:name])
202
+ # OUTPUT DATABASE IN FASTA FORMAT
203
+ elsif opts[:out_file]
204
+ error_msg = ""
41
205
 
42
- # Schema will run here
43
- db_schema
206
+ error_msg += "-n option: \t the name of your database\n" unless opts[:name]
207
+ error_msg += "-o option: \t name of your output file\n" unless opts[:output]
44
208
 
45
- ref = opts[:database_reference_file]
209
+ unless error_msg == ""
210
+ puts "Please provide the following required fields:"
211
+ puts error_msg
212
+ puts opts.help unless opts.empty?
213
+ exit
214
+ end
46
215
 
47
- sequence_format = guess_sequence_format(ref)
216
+ abort "#{opts[:name]} database does not exist!" unless File.exist?(opts[:name])
48
217
 
49
- case sequence_format
50
- when :genbank
51
- sequence_flatfile = Bio::FlatFile.open(Bio::GenBank,opts[:database_reference_file]).next_entry
52
- when :embl
53
- sequence_flatfile = Bio::FlatFile.open(Bio::EMBL,opts[:database_reference_file]).next_entry
54
- else
55
- puts "All sequence files should be of genbank or embl format"
56
- exit
218
+ establish_connection(opts[:name])
219
+
220
+ # Getting list of strains from database
221
+ strains = Strain.all
222
+
223
+ sequence_hash = Hash.new
224
+ # create a sequence hash
225
+ # hash key is strain_id, loop through strain_id
226
+ # create an empty array
227
+ strains.each do |strain|
228
+ sequence_hash[strain.id] = Array.new
57
229
  end
58
230
 
59
231
 
60
- # path for vcf file here
61
- vcf_mpileup_file = opts[:vcf_file]
232
+ output = File.open("#{opts[:output]}", "w")
233
+
234
+ # Select all snps
235
+ snps = Snp.all
236
+
237
+ i = 0
238
+ puts "Your out file is being prepared......."
239
+ snps.each do |snp|
240
+ i += 1
241
+ puts "Total number of SNPs outputted so far: #{i}" if i % 100 == 0
62
242
 
63
- # The populate_features_and_annotations method populates the features and annotations. It uses the embl/gbk file.
64
- populate_features_and_annotations(sequence_flatfile)
243
+ ActiveRecord::Base.transaction do
244
+ snp.alleles.each do |allele|
245
+ # puts allele.inspect
246
+ allele.genotypes.each do |genotype|
247
+ #push bases to hash
248
+ sequence_hash[genotype.strain_id] << allele.base
249
+ end
250
+ end
251
+ end
252
+
65
253
 
66
- #The populate_snps_alleles_genotypes method populates the snps, alleles and genotypes. It uses the vcf file, and if specified, the SNP quality cutoff and genotype quality cutoff
67
- populate_snps_alleles_genotypes(vcf_mpileup_file, opts[:cuttoff_snp].to_i, opts[:cuttoff_genotype].to_i)
254
+ #generate FASTA file
255
+ strains.each do |strain|
256
+ output.print ">#{strain.name}\n" , sequence_hash[strain.id].join("")
257
+ output.puts
258
+ end
68
259
 
260
+ if opts[:tree]
261
+ `FastTree -fastest -nt #{opts[:output]} > #{opts[:w]}`
262
+ end
263
+ end
264
+
265
+ else
266
+ puts opts.help
267
+ end
data/lib/snp-search.rb CHANGED
@@ -166,3 +166,14 @@ puts "Adding SNPs........"
166
166
  snp.save
167
167
  end
168
168
  end
169
+
170
+ def find_shared_snps(strain_names)
171
+ *strain_names = strain_names
172
+
173
+ where_statement = strain_names.collect{|strain_name| "strains.name = '#{strain_name}' OR "}.join("").sub(/ OR $/, "")
174
+
175
+ puts "Snp.find_by_sql(\"SELECT * from snps INNER JOIN alleles ON alleles.snp_id = snps.id INNER JOIN genotypes ON alleles.id = genotypes.allele_id INNER JOIN strains ON strains.id = genotypes.strain_id WHERE (#{where_statement}) AND alleles.id <> snps.reference_allele_id AND (SELECT COUNT(*) from snps AS s INNER JOIN alleles ON alleles.snp_id = snps.id INNER JOIN genotypes ON alleles.id = genotypes.allele_id WHERE alleles.id <> snps.reference_allele_id and s.id = snps.id) = #{strain_names.size} GROUP BY snps.id HAVING COUNT(*) = #{strain_names.size}\")"
176
+ end
177
+
178
+
179
+
@@ -6,7 +6,5 @@ def establish_connection(db_location)
6
6
  ActiveRecord::Base.establish_connection(
7
7
  :adapter => "sqlite3",
8
8
  :database => db_location,
9
- :pool => 5,
10
- :timeout => 5000
11
9
  )
12
10
  end
data/snp-search.gemspec CHANGED
@@ -5,11 +5,11 @@
5
5
 
6
6
  Gem::Specification.new do |s|
7
7
  s.name = "snp-search"
8
- s.version = "0.34.0"
8
+ s.version = "1.0.0"
9
9
 
10
10
  s.required_rubygems_version = Gem::Requirement.new(">= 0") if s.respond_to? :required_rubygems_version=
11
11
  s.authors = ["Ali Al-Shahib", "Anthony Underwood"]
12
- s.date = "2012-01-11"
12
+ s.date = "2012-05-10"
13
13
  s.description = "Use the snp-search tool to create, import, manipulate and query your SNP database"
14
14
  s.email = "ali.al-shahib@hpa.org.uk"
15
15
  s.executables = ["snp-search"]
@@ -28,9 +28,6 @@ Gem::Specification.new do |s|
28
28
  "Rakefile",
29
29
  "VERSION",
30
30
  "bin/snp-search",
31
- "examples/example1.rb",
32
- "examples/example2.rb",
33
- "examples/snp_db_models.rb",
34
31
  "lib/snp-search.rb",
35
32
  "lib/snp_db_connection.rb",
36
33
  "lib/snp_db_models.rb",
metadata CHANGED
@@ -1,7 +1,7 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: snp-search
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.34.0
4
+ version: 1.0.0
5
5
  prerelease:
6
6
  platform: ruby
7
7
  authors:
@@ -10,11 +10,11 @@ authors:
10
10
  autorequire:
11
11
  bindir: bin
12
12
  cert_chain: []
13
- date: 2012-01-11 00:00:00.000000000Z
13
+ date: 2012-05-10 00:00:00.000000000Z
14
14
  dependencies:
15
15
  - !ruby/object:Gem::Dependency
16
16
  name: activerecord
17
- requirement: &2166762620 !ruby/object:Gem::Requirement
17
+ requirement: &2165230340 !ruby/object:Gem::Requirement
18
18
  none: false
19
19
  requirements:
20
20
  - - ~>
@@ -22,10 +22,10 @@ dependencies:
22
22
  version: 3.1.3
23
23
  type: :runtime
24
24
  prerelease: false
25
- version_requirements: *2166762620
25
+ version_requirements: *2165230340
26
26
  - !ruby/object:Gem::Dependency
27
27
  name: bio
28
- requirement: &2166762140 !ruby/object:Gem::Requirement
28
+ requirement: &2165229420 !ruby/object:Gem::Requirement
29
29
  none: false
30
30
  requirements:
31
31
  - - ~>
@@ -33,10 +33,10 @@ dependencies:
33
33
  version: 1.4.2
34
34
  type: :runtime
35
35
  prerelease: false
36
- version_requirements: *2166762140
36
+ version_requirements: *2165229420
37
37
  - !ruby/object:Gem::Dependency
38
38
  name: slop
39
- requirement: &2166761620 !ruby/object:Gem::Requirement
39
+ requirement: &2165228320 !ruby/object:Gem::Requirement
40
40
  none: false
41
41
  requirements:
42
42
  - - ~>
@@ -44,10 +44,10 @@ dependencies:
44
44
  version: 2.4.0
45
45
  type: :runtime
46
46
  prerelease: false
47
- version_requirements: *2166761620
47
+ version_requirements: *2165228320
48
48
  - !ruby/object:Gem::Dependency
49
49
  name: sqlite3
50
- requirement: &2166761060 !ruby/object:Gem::Requirement
50
+ requirement: &2165227400 !ruby/object:Gem::Requirement
51
51
  none: false
52
52
  requirements:
53
53
  - - ~>
@@ -55,10 +55,10 @@ dependencies:
55
55
  version: 1.3.4
56
56
  type: :runtime
57
57
  prerelease: false
58
- version_requirements: *2166761060
58
+ version_requirements: *2165227400
59
59
  - !ruby/object:Gem::Dependency
60
60
  name: activerecord-import
61
- requirement: &2166760580 !ruby/object:Gem::Requirement
61
+ requirement: &2165226380 !ruby/object:Gem::Requirement
62
62
  none: false
63
63
  requirements:
64
64
  - - ~>
@@ -66,10 +66,10 @@ dependencies:
66
66
  version: 0.2.8
67
67
  type: :runtime
68
68
  prerelease: false
69
- version_requirements: *2166760580
69
+ version_requirements: *2165226380
70
70
  - !ruby/object:Gem::Dependency
71
71
  name: rspec
72
- requirement: &2166760100 !ruby/object:Gem::Requirement
72
+ requirement: &2165225400 !ruby/object:Gem::Requirement
73
73
  none: false
74
74
  requirements:
75
75
  - - ~>
@@ -77,10 +77,10 @@ dependencies:
77
77
  version: 2.3.0
78
78
  type: :development
79
79
  prerelease: false
80
- version_requirements: *2166760100
80
+ version_requirements: *2165225400
81
81
  - !ruby/object:Gem::Dependency
82
82
  name: bundler
83
- requirement: &2166759620 !ruby/object:Gem::Requirement
83
+ requirement: &2165224600 !ruby/object:Gem::Requirement
84
84
  none: false
85
85
  requirements:
86
86
  - - ~>
@@ -88,10 +88,10 @@ dependencies:
88
88
  version: 1.0.0
89
89
  type: :development
90
90
  prerelease: false
91
- version_requirements: *2166759620
91
+ version_requirements: *2165224600
92
92
  - !ruby/object:Gem::Dependency
93
93
  name: jeweler
94
- requirement: &2166759120 !ruby/object:Gem::Requirement
94
+ requirement: &2165223220 !ruby/object:Gem::Requirement
95
95
  none: false
96
96
  requirements:
97
97
  - - ~>
@@ -99,10 +99,10 @@ dependencies:
99
99
  version: 1.6.4
100
100
  type: :development
101
101
  prerelease: false
102
- version_requirements: *2166759120
102
+ version_requirements: *2165223220
103
103
  - !ruby/object:Gem::Dependency
104
104
  name: rcov
105
- requirement: &2166758600 !ruby/object:Gem::Requirement
105
+ requirement: &2165222000 !ruby/object:Gem::Requirement
106
106
  none: false
107
107
  requirements:
108
108
  - - ! '>='
@@ -110,7 +110,7 @@ dependencies:
110
110
  version: '0'
111
111
  type: :development
112
112
  prerelease: false
113
- version_requirements: *2166758600
113
+ version_requirements: *2165222000
114
114
  description: Use the snp-search tool to create, import, manipulate and query your
115
115
  SNP database
116
116
  email: ali.al-shahib@hpa.org.uk
@@ -131,9 +131,6 @@ files:
131
131
  - Rakefile
132
132
  - VERSION
133
133
  - bin/snp-search
134
- - examples/example1.rb
135
- - examples/example2.rb
136
- - examples/snp_db_models.rb
137
134
  - lib/snp-search.rb
138
135
  - lib/snp_db_connection.rb
139
136
  - lib/snp_db_models.rb
@@ -156,7 +153,7 @@ required_ruby_version: !ruby/object:Gem::Requirement
156
153
  version: '0'
157
154
  segments:
158
155
  - 0
159
- hash: -1735176367152600706
156
+ hash: 1630410471760364863
160
157
  required_rubygems_version: !ruby/object:Gem::Requirement
161
158
  none: false
162
159
  requirements:
data/examples/example1.rb DELETED
@@ -1,92 +0,0 @@
1
- # This query script removes the 'phage' genes from the database.
2
- # Only use this script once your database has been fully populated.
3
- # Usage: ruby example1.rb -d your_db_name.sqlite3 -s list_of_your_species.txt -o output.fasta
4
- # You may use this script to do other SQL queries that result in a fasta output. Just change the 'snps' SQL query below with your query.
5
- require 'snp_db_models'
6
- gem "slop", "~> 2.4.0"
7
- require 'slop'
8
-
9
- opts = Slop.new :help do
10
- banner "ruby query.rb [OPTIONS]"
11
-
12
- on :V, :verbose, 'Enable verbose mode'
13
- on :D, :database=, 'The name of the database you like to query', true
14
- on :o, :outfile=, 'output file, in fasta format', true
15
- on :s, :strain=, 'The strains/samples you like to query', true
16
- on :a, :annotation=, 'The gene you like to remove from analysis', true
17
-
18
- on_empty do
19
- puts help
20
- end
21
- end
22
- opts.parse
23
-
24
- puts "You must supply the -s option, it's a required field" and exit unless opts[:strain]
25
- puts "You must supply the -D option, it's a required field" and exit unless opts[:database]
26
-
27
- begin
28
- puts "#{opts[:database]} file does not exist!" and exit unless File.exist?(opts[:database])
29
- rescue
30
- end
31
-
32
- begin
33
- puts "#{opts[:strain]} file does not exist!" and exit unless File.exist?(opts[:strain])
34
- rescue
35
- end
36
-
37
- annotation = opts[:annotation]
38
- establish_connection(opts[:database])
39
-
40
- begin
41
- strains = []
42
- File.read(opts[:strain]).each_line do |line|
43
- strains << line.chop
44
- end
45
-
46
- # Enter the name of your database
47
-
48
- outfile = File.open(opts[:outfile], "w")
49
-
50
-
51
- # create a sequence hash
52
- sequence_hash = Hash.new
53
-
54
- # create an array of strains
55
-
56
- # hash key is strain_name, loop through strain_names
57
- # create an empty array
58
- strains.each do |strain_name|
59
- sequence_hash[strain_name] = Array.new
60
- end
61
-
62
- snps = Snp.find_by_sql("SELECT snps.* FROM snps
63
- INNER JOIN features
64
- ON features.id = snps.feature_id
65
- WHERE features.id IN
66
- (select features.id from features
67
- WHERE id NOT IN
68
- (select distinct features.id FROM features
69
- INNER JOIN annotations ON
70
- annotations.feature_id = features.id
71
- WHERE annotations.value LIKE '%(#{annotation})%'))")
72
-
73
-
74
- #puts snps.size
75
- puts "Your Query is submitted and is being processed......."
76
- snps.each do |snp|
77
- #break if i == 100
78
- snp.alleles.each do |allele|
79
- allele.genotypes.each do |genotype|
80
- # puts genotype.inspect
81
- sequence_hash[genotype.strain.name] << allele.base
82
- end
83
- end
84
- end
85
-
86
- strains.each do |sn|
87
- outfile.print ">#{sn}\n" , sequence_hash[sn].join("")
88
- outfile.puts
89
- end
90
-
91
- rescue
92
- end
data/examples/example2.rb DELETED
@@ -1,61 +0,0 @@
1
- # This query script finds the unique snps amongs the list of strains provided.
2
- # Only use this script once your database has been fully populated.
3
- # Usage: ruby example2.rb -d your_db_name.sqlite3 -s list_of_your_species.txt
4
- # Output is the number of unique snps in the list of your strains provided in the -s option.
5
- # You may use this script to do other SQL queries. Just change the SQL query below with your query.
6
-
7
- require 'snp_db_models'
8
- gem "slop", "~> 2.4.0"
9
- require 'slop'
10
-
11
- opts = Slop.new :help do
12
- banner "ruby query.rb [OPTIONS]"
13
-
14
- on :V, :verbose, 'Enable verbose mode'
15
- on :D, :database=, 'The name of the database you like to query', true
16
- on :s, :strain=, 'The strains/samples you like to query', true
17
-
18
- on_empty do
19
- puts help
20
- end
21
- end
22
- opts.parse
23
-
24
- puts "You must supply the -D option, it's a required field" and exit unless opts[:database]
25
- puts "You must supply the -s option, it's a required field" and exit unless opts[:strain]
26
-
27
- begin
28
- puts "#{opts[:database]} file does not exist!" and exit unless File.exist?(opts[:database])
29
- rescue
30
- end
31
-
32
- begin
33
- puts "#{opts[:strain]} file does not exist!" and exit unless File.exist?(opts[:strain])
34
- rescue
35
- end
36
-
37
-
38
- establish_connection(opts[:database])
39
-
40
- begin
41
- strains = []
42
- File.read(opts[:strain]).each_line do |line|
43
- strains << line.chop
44
- end
45
-
46
- def find_shared_snps(strain_names)
47
- *strain_names = strain_names
48
-
49
- where_statement = strain_names.collect{|strain_name| "strains.name = '#{strain_name}' OR "}.join("").sub(/ OR $/, "")
50
-
51
- return Snp.find_by_sql("SELECT * FROM (SELECT features.* from features INNER JOIN snps ON features.id = snps.feature_id INNER JOIN alleles ON alleles.snp_id = snps.id INNER JOIN genotypes ON alleles.id = genotypes.allele_id INNER JOIN strains ON strains.id = genotypes.strain_id WHERE (#{where_statement}) AND alleles.id <> snps.reference_allele_id AND (SELECT COUNT(*) from snps AS s INNER JOIN alleles ON alleles.snp_id = snps.id INNER JOIN genotypes ON alleles.id = genotypes.allele_id WHERE alleles.id <> snps.reference_allele_id and s.id = snps.id) = #{strain_names.size} GROUP BY snps.id HAVING COUNT(*) = #{strain_names.size})");
52
- end
53
-
54
- gas_snps = find_shared_snps(strains)
55
-
56
- gas_snps.each do |snp|
57
- puts "The number of unique snps are #{snp.id}"
58
- end
59
-
60
- rescue
61
- end
@@ -1,32 +0,0 @@
1
- require 'snp_db_connection'
2
-
3
- class Strain < ActiveRecord::Base
4
- has_many :alleles, :through => :genotypes
5
- has_many :genotypes
6
- end
7
-
8
- class Feature < ActiveRecord::Base
9
- has_many :annotations
10
- has_many :snps
11
- end
12
-
13
- class Snp < ActiveRecord::Base
14
- belongs_to :feature
15
- has_many :alleles
16
- belongs_to :reference_allele, :class_name => "Allele", :foreign_key => "reference_allele_id"
17
- end
18
-
19
- class Allele < ActiveRecord::Base
20
- has_many :genotypes
21
- belongs_to :snp
22
- has_many :strains, :through => :genotypes
23
- end
24
-
25
- class Genotype < ActiveRecord::Base
26
- belongs_to :allele
27
- belongs_to :strain
28
- end
29
-
30
- class Annotation < ActiveRecord::Base
31
- belongs_to :feature
32
- end