snp-search 0.34.0 → 1.0.0
Sign up to get free protection for your applications and to get access to all the features.
- data/README.rdoc +46 -37
- data/VERSION +1 -1
- data/bin/snp-search +241 -42
- data/lib/snp-search.rb +11 -0
- data/lib/snp_db_connection.rb +0 -2
- data/snp-search.gemspec +2 -5
- metadata +21 -24
- data/examples/example1.rb +0 -92
- data/examples/example2.rb +0 -61
- data/examples/snp_db_models.rb +0 -32
data/README.rdoc
CHANGED
@@ -1,6 +1,6 @@
|
|
1
1
|
= snp-search
|
2
2
|
|
3
|
-
SNPsearch is a tool that manages SNP data and allows for data importing, manipulating, editing and complex querying of SNP data. It can be used to evaluate the utility of SNPs for the assessment of genetic diversity between haploid strains and the management of genotype and phenotype data. Once
|
3
|
+
SNPsearch is a tool that manages SNP data and allows for data importing, manipulating, editing and complex querying of SNP data. It can be used to evaluate the utility of SNPs for the assessment of genetic diversity between haploid strains and the management of genotype and phenotype data. Once the database is created, the user is provided with several query and output options. SNPsearch is particularly useful in the analysis of phylogenetic trees that are based on SNP differences across whole core genomes. Queries can be made to answer critical genomic questions such as the association of SNPs with particular phenotypes.
|
4
4
|
|
5
5
|
== Obtaining and installing the code
|
6
6
|
SNPsearch is written in Ruby and operates in a Unix environment. It is made available as a gem. See the github site for more information (https://github.com/hpa-bioinformatics/snp-search).
|
@@ -15,70 +15,79 @@ Not much, you just need:
|
|
15
15
|
* Unix. Once snp-search is installed, all the necessary gems to run snp-search will also be installed from Rubygems (note that Rubygems requires admin privileges. If you do not have admin privileges then we suggest you install RVM: (http://beginrescueend.com/rvm/install/) and then gem install snp-search).
|
16
16
|
* ruby version 1.8.7 and above.
|
17
17
|
|
18
|
+
* Optional: FastTree. If you require a tree output in Newick format, you must install FastTree from http://www.microbesonline.org/fasttree/#Install. You must specify the path of the executable in your .bashrc or .profile file as snp-search will run the command as just 'FastTree' and will not know where FastTree is if it is not specified in your .bashrc or .profile file.
|
19
|
+
|
18
20
|
Thats it!
|
19
21
|
|
20
22
|
== Running snp-search
|
21
23
|
|
22
|
-
|
24
|
+
1- Creating the database (snp-search -create)
|
25
|
+
|
26
|
+
Two files are needed to create the SQLite3 database:
|
23
27
|
|
24
|
-
1- Variant Call Format (.vcf) file (which contains the SNP information)
|
28
|
+
1- Variant Call Format (.vcf) file (which contains the SNP information)
|
25
29
|
|
26
|
-
2- Your database reference genome that you used to generate your .vcf file (in genbank or embl format, the script will automatically detect the format).
|
30
|
+
2- Your database reference genome that you used to generate your .vcf file (in genbank or embl format, the script will automatically detect the format).
|
27
31
|
|
28
|
-
|
32
|
+
You need the following parameters:
|
29
33
|
|
30
|
-
-V Enable verbose mode
|
31
34
|
-n Name of your database
|
32
|
-
-v .vcf file
|
33
|
-
-d Database Reference genome (The same file that was used in generating the .vcf file). This should be in genbank or embl format.
|
35
|
+
-v .vcf file
|
36
|
+
-d Database Reference genome (The same file that was used in generating the .vcf file). This should be in genbank or embl format.
|
37
|
+
|
38
|
+
Other options:
|
34
39
|
-c SNP quality score cutoff. A Phred-scaled quality score. High quality scores indicate high confidence calls. Optional, default = 90 (out of 100)
|
35
|
-
-
|
40
|
+
-g Genotype Quality score cutoff. Phred-scaled quality score that the genotype is true. Optional, default = 30
|
36
41
|
-h help message
|
37
42
|
|
38
|
-
Usage:
|
39
|
-
|
43
|
+
Usage:
|
44
|
+
snp-search -create -n my_snp_db.sqlite3 -d my_ref.gbk -v my_vcf_file.vcf
|
40
45
|
|
41
|
-
Note: The strain names in your database will be taken from your vcf file so make sure they are named appropriately in your vcf file.
|
46
|
+
Note: The strain names in your database will be taken from your vcf file so make sure they are named appropriately in your vcf file.
|
42
47
|
|
43
|
-
|
44
|
-
The output is your database in sqlite3 format. If you like to view your table(s) and perform queries you can type
|
45
|
-
sqlite3 snp_db.sqlite3
|
48
|
+
2- Querying the Database (snp-search -query)
|
46
49
|
|
47
|
-
|
50
|
+
Two queries are currently scripted in SNPsearch:
|
48
51
|
|
49
|
-
|
52
|
+
1- genes_query: This option queries the database and selects the number of unique SNPs within the list of the strains/samples provided. The output is the number of unique SNPs.
|
50
53
|
|
51
|
-
|
54
|
+
You need the following parameters:
|
52
55
|
|
53
|
-
|
56
|
+
-n Name of your database
|
57
|
+
-s The strains/samples you like to query
|
54
58
|
|
55
|
-
|
59
|
+
Usage:
|
60
|
+
snp-search -n my_snp_db.sqlite3 -s list_of_my_strains.txt
|
56
61
|
|
57
|
-
|
62
|
+
2- remove_genes: This option queries the database to select only those SNPs not found in a specified gene. These SNPs are used to make a concatenated SNP multiple alignment file (FASTA format). This is a way of removing a set of genes (likely to be mobile element genes) that are not needed for SNP analysis. The user has the option of generating a core SNP tree Newick file for SNP phylogeny.
|
58
63
|
|
59
|
-
|
64
|
+
You need the following parameters:
|
60
65
|
|
61
|
-
|
66
|
+
-n Name of your database
|
67
|
+
-a The gene you like to remove from analysis
|
68
|
+
-o Output file, in fasta format
|
62
69
|
|
63
|
-
|
64
|
-
-
|
65
|
-
-
|
66
|
-
-s, The strains/samples you like to query, Required
|
67
|
-
-a, The gene you like to remove from analysis
|
68
|
-
-h, Print this help message
|
70
|
+
options:
|
71
|
+
-t Generate SNP phylogeny
|
72
|
+
-w Output tree in Newick format
|
69
73
|
|
70
|
-
|
74
|
+
Usage (phage is used as the example gene):
|
75
|
+
snp-search -n my_snp_db.sqlite3 -a phage -o snps_sequences_without_phage.fasta -t -w snps_sequences_without_phage.nwk
|
71
76
|
|
72
|
-
|
77
|
+
The algorithm FastTree is used to generate the nwk file. FastTree can be downloaded from http://www.microbesonline.org/fasttree/#Install (see above)
|
73
78
|
|
74
|
-
|
79
|
+
3- Output database (snp-search -out_file)
|
75
80
|
|
76
|
-
|
81
|
+
You need the following parameters:
|
77
82
|
|
78
|
-
-
|
79
|
-
-
|
80
|
-
|
81
|
-
|
83
|
+
-n Name of your database
|
84
|
+
-o Output file containing the database in fasta format
|
85
|
+
|
86
|
+
== View database in Unix or in a GUI
|
87
|
+
Your database will be in sqlite3 format. If you like to view your table(s) and perform direct queries you can type
|
88
|
+
sqlite3 snp_db.sqlite3
|
89
|
+
|
90
|
+
Alternatively, you may download a SQL tool to view your database (e.g. SQLite sorcerer).
|
82
91
|
|
83
92
|
== Contact
|
84
93
|
|
data/VERSION
CHANGED
@@ -1 +1 @@
|
|
1
|
-
0.
|
1
|
+
1.0.0
|
data/bin/snp-search
CHANGED
@@ -2,67 +2,266 @@ require 'snp-search'
|
|
2
2
|
require 'snp_db_connection'
|
3
3
|
require 'snp_db_models'
|
4
4
|
require 'snp_db_schema'
|
5
|
-
|
5
|
+
require 'activerecord-import'
|
6
|
+
# gem "slop", "~> 3.1.0"
|
6
7
|
gem "slop", "~> 2.4.0"
|
7
8
|
require 'slop'
|
8
9
|
|
9
|
-
opts = Slop.new
|
10
|
-
|
10
|
+
opts = Slop.new do
|
11
|
+
|
12
|
+
# separator 'test'
|
11
13
|
|
12
|
-
|
13
|
-
on :
|
14
|
-
on :
|
15
|
-
on :
|
16
|
-
|
17
|
-
|
18
|
-
|
14
|
+
banner "\nruby snp-search [OPTIONS]"
|
15
|
+
on :C, :create, 'Create database'
|
16
|
+
on :Q, :query, 'Query database'
|
17
|
+
on :O, :out_file, 'Output the database to a file'
|
18
|
+
# separator ''
|
19
|
+
# separator 'README file: https://github.com/hpa-bioinformatics/snp-search/blob/master/README.rdoc'
|
20
|
+
# separator 'The following command must be used when using -create, or -query or -out_file'
|
21
|
+
on :n, :name=, 'Name of database, Required'
|
22
|
+
# separator ''
|
23
|
+
# separator '-create options'
|
24
|
+
on :d, :database_reference_file, 'Reference genome file, in gbk or embl file format, Required', true
|
25
|
+
on :v, :vcf_file, '.vcf file, Required', true
|
26
|
+
on :c, :cuttoff_snp, 'SNP quality cutoff, (default = 90)', :default => 90
|
27
|
+
on :g, :cuttoff_genotype, 'Genotype quality cutoff (default = 30)', :default => 30
|
28
|
+
# separator ''
|
29
|
+
# separator '-query options'
|
30
|
+
on :G, :genes_query, 'Query for unique genes in the database'
|
31
|
+
on :R, :remove_genes, 'Remove set of genes from database and create FASTA file'
|
32
|
+
on :s, :strain=, 'The strains/samples you like to query, Required'
|
33
|
+
on :a, :annotation=, 'The gene you like to remove from analysis'
|
34
|
+
on :o, :output=, 'output file, in fasta format'
|
35
|
+
on :t, :tree, 'Generate SNP phylogeny'
|
36
|
+
on :w, :tree_nwk_output=, 'output tree in Newick format'
|
37
|
+
on :S, :syn, 'syn'
|
19
38
|
end
|
20
39
|
opts.parse
|
21
40
|
|
22
|
-
|
41
|
+
###########################################################
|
23
42
|
|
24
|
-
|
25
|
-
|
26
|
-
error_msg += "You must supply the -v option, it's a required field" unless opts[:vcf_file]
|
43
|
+
# CREATING A DATABASE
|
44
|
+
if opts[:create]
|
27
45
|
|
28
|
-
|
29
|
-
|
30
|
-
|
31
|
-
|
46
|
+
|
47
|
+
error_msg = ""
|
48
|
+
|
49
|
+
error_msg += "-n option: \t the name of your database\n" unless opts[:name]
|
50
|
+
error_msg += "-d option: \t reference genome file, in gbk or embl file format\n" unless opts[:database_reference_file]
|
51
|
+
error_msg += "-v option: \t .vcf file\n" unless opts[:vcf_file]
|
52
|
+
|
53
|
+
unless error_msg == ""
|
54
|
+
puts "Please provide the following required fields:"
|
55
|
+
puts error_msg
|
56
|
+
puts opts.help unless opts.empty?
|
57
|
+
exit
|
58
|
+
end
|
59
|
+
|
60
|
+
|
61
|
+
abort "#{opts[:database_reference_file]} file does not exist!" unless File.exist?(opts[:database_reference_file])
|
62
|
+
|
63
|
+
abort "#{opts[:vcf_file]} file does not exist!" unless File.exist?(opts[:vcf_file])
|
64
|
+
|
65
|
+
|
66
|
+
# Name of your database
|
67
|
+
establish_connection(opts[:name])
|
68
|
+
|
69
|
+
# Schema will run here
|
70
|
+
db_schema
|
71
|
+
|
72
|
+
ref = opts[:database_reference_file]
|
73
|
+
|
74
|
+
sequence_format = guess_sequence_format(ref)
|
75
|
+
|
76
|
+
case sequence_format
|
77
|
+
when :genbank
|
78
|
+
sequence_flatfile = Bio::FlatFile.open(Bio::GenBank,opts[:database_reference_file]).next_entry
|
79
|
+
when :embl
|
80
|
+
sequence_flatfile = Bio::FlatFile.open(Bio::EMBL,opts[:database_reference_file]).next_entry
|
81
|
+
else
|
82
|
+
puts "All sequence files should be in genbank or embl format"
|
83
|
+
exit
|
84
|
+
end
|
85
|
+
|
86
|
+
# path for vcf file here
|
87
|
+
vcf_mpileup_file = opts[:vcf_file]
|
88
|
+
|
89
|
+
# The populate_features_and_annotations method populates the features and annotations. It uses the embl/gbk file.
|
90
|
+
populate_features_and_annotations(sequence_flatfile)
|
91
|
+
|
92
|
+
#The populate_snps_alleles_genotypes method populates the snps, alleles and genotypes. It uses the vcf file, and if specified, the SNP quality cutoff and genotype quality cutoff
|
93
|
+
populate_snps_alleles_genotypes(vcf_mpileup_file, opts[:cuttoff_snp].to_i, opts[:cuttoff_genotype].to_i)
|
94
|
+
|
95
|
+
###########################################################
|
96
|
+
|
97
|
+
# QUERYING THE DATABASE
|
98
|
+
elsif opts [:query]
|
99
|
+
#FIND UNIQUE SNPS
|
100
|
+
if opts[:genes_query]
|
101
|
+
|
102
|
+
error_msg = ""
|
103
|
+
|
104
|
+
error_msg += "-n option, \t the name of your database\n" unless opts[:name]
|
105
|
+
error_msg += "-s option, \t list of strains you like to query\n" unless opts[:strain]
|
106
|
+
|
107
|
+
unless error_msg == ""
|
108
|
+
puts "Please provide the following required fields:"
|
109
|
+
puts error_msg
|
110
|
+
puts opts.help unless opts.empty?
|
111
|
+
exit
|
112
|
+
end
|
113
|
+
|
114
|
+
abort "#{opts[:name]} database does not exist!" unless File.exist?(opts[:name])
|
115
|
+
abort "#{opts[:strain]} file does not exist!" unless File.exist?(opts[:strain])
|
116
|
+
|
117
|
+
establish_connection(opts[:name])
|
118
|
+
|
119
|
+
strains = []
|
120
|
+
File.read(opts[:strain]).each_line do |line|
|
121
|
+
strains << line.chop
|
122
|
+
end
|
123
|
+
|
124
|
+
# puts find_shared_snps(strains)
|
125
|
+
# exit
|
126
|
+
gas_snps = find_shared_snps(strains)
|
127
|
+
|
128
|
+
gas_snps.each do |snp|
|
129
|
+
puts "The number of unique snps are #{snp.id}.size"
|
130
|
+
end
|
131
|
+
|
132
|
+
################################################################
|
133
|
+
# REMOVE SNPS ASSOCIATED WITH SPECIFIC GENES
|
134
|
+
elsif opts[:remove_genes]
|
135
|
+
|
136
|
+
error_msg = ""
|
137
|
+
|
138
|
+
error_msg += "-n option: \t the name of your database\n" unless opts[:name]
|
139
|
+
error_msg += "-o option: \t name of your output file\n" unless opts[:output]
|
140
|
+
error_msg += "-a option: \t name of the gene that you like to remove from the database\n" unless opts[:annotation]
|
141
|
+
|
142
|
+
unless error_msg == ""
|
143
|
+
puts "Please provide the following required fields:"
|
144
|
+
puts error_msg
|
145
|
+
puts opts.help unless opts.empty?
|
146
|
+
exit
|
147
|
+
end
|
148
|
+
|
149
|
+
abort "#{opts[:name]} database does not exist!" unless File.exist?(opts[:name])
|
150
|
+
|
151
|
+
# annotation = opts[:annotation]
|
152
|
+
establish_connection(opts[:name])
|
153
|
+
|
154
|
+
|
155
|
+
# Getting list of strains from database
|
156
|
+
strains = Strain.all
|
157
|
+
|
158
|
+
sequence_hash = Hash.new
|
159
|
+
# create a sequence hash
|
160
|
+
# hash key is strain_id, loop through strain_id
|
161
|
+
# create an empty array
|
162
|
+
strains.each do |strain|
|
163
|
+
sequence_hash[strain.id] = Array.new
|
164
|
+
end
|
165
|
+
|
166
|
+
# output opened for data input
|
167
|
+
output = File.open("#{opts[:output]}", "w")
|
168
|
+
|
169
|
+
# Perform query
|
170
|
+
snps = Snp.includes(:alleles => :genotypes).find_by_sql("SELECT snps.* FROM snps INNER JOIN features ON features.id = snps.feature_id WHERE features.id NOT IN (select distinct features.id FROM features INNER JOIN annotations ON annotations.feature_id = features.id WHERE annotations.value LIKE '%#{opts[:annotation]}%')")
|
171
|
+
|
172
|
+
i = 0
|
173
|
+
puts "Your Query is submitted and is being processed......."
|
174
|
+
snps.each do |snp|
|
175
|
+
# puts snp.inspect
|
176
|
+
i += 1
|
177
|
+
puts "Total number of SNPs generated so far: #{i}" if i % 100 == 0
|
178
|
+
ActiveRecord::Base.transaction do
|
179
|
+
snp.alleles.each do |allele|
|
180
|
+
# puts allele.inspect
|
181
|
+
allele.genotypes.each do |genotype|
|
182
|
+
#push bases to hash
|
183
|
+
sequence_hash[genotype.strain_id] << allele.base
|
184
|
+
end
|
185
|
+
end
|
186
|
+
end
|
187
|
+
end
|
188
|
+
|
189
|
+
#generate FASTA file
|
190
|
+
strains.each do |strain|
|
191
|
+
output.print ">#{strain.name}\n" , sequence_hash[strain.id].join("")
|
192
|
+
output.puts
|
193
|
+
end
|
194
|
+
|
195
|
+
if opts[:tree]
|
196
|
+
`FastTree -fastest -nt #{opts[:output]} > #{opts[:w]}`
|
197
|
+
end
|
32
198
|
end
|
33
|
-
|
34
|
-
abort "#{opts[:database_reference_file]} file does not exist!" unless File.exist?(opts[:database_reference_file])
|
35
|
-
|
36
|
-
abort "#{opts[:vcf_file]} file does not exist!" unless File.exist?(opts[:vcf_file])
|
37
199
|
|
200
|
+
##############################################################
|
38
201
|
|
39
|
-
#
|
40
|
-
|
202
|
+
# OUTPUT DATABASE IN FASTA FORMAT
|
203
|
+
elsif opts[:out_file]
|
204
|
+
error_msg = ""
|
41
205
|
|
42
|
-
|
43
|
-
|
206
|
+
error_msg += "-n option: \t the name of your database\n" unless opts[:name]
|
207
|
+
error_msg += "-o option: \t name of your output file\n" unless opts[:output]
|
44
208
|
|
45
|
-
|
209
|
+
unless error_msg == ""
|
210
|
+
puts "Please provide the following required fields:"
|
211
|
+
puts error_msg
|
212
|
+
puts opts.help unless opts.empty?
|
213
|
+
exit
|
214
|
+
end
|
46
215
|
|
47
|
-
|
216
|
+
abort "#{opts[:name]} database does not exist!" unless File.exist?(opts[:name])
|
48
217
|
|
49
|
-
|
50
|
-
|
51
|
-
|
52
|
-
|
53
|
-
|
54
|
-
|
55
|
-
|
56
|
-
|
218
|
+
establish_connection(opts[:name])
|
219
|
+
|
220
|
+
# Getting list of strains from database
|
221
|
+
strains = Strain.all
|
222
|
+
|
223
|
+
sequence_hash = Hash.new
|
224
|
+
# create a sequence hash
|
225
|
+
# hash key is strain_id, loop through strain_id
|
226
|
+
# create an empty array
|
227
|
+
strains.each do |strain|
|
228
|
+
sequence_hash[strain.id] = Array.new
|
57
229
|
end
|
58
230
|
|
59
231
|
|
60
|
-
|
61
|
-
|
232
|
+
output = File.open("#{opts[:output]}", "w")
|
233
|
+
|
234
|
+
# Select all snps
|
235
|
+
snps = Snp.all
|
236
|
+
|
237
|
+
i = 0
|
238
|
+
puts "Your out file is being prepared......."
|
239
|
+
snps.each do |snp|
|
240
|
+
i += 1
|
241
|
+
puts "Total number of SNPs outputted so far: #{i}" if i % 100 == 0
|
62
242
|
|
63
|
-
|
64
|
-
|
243
|
+
ActiveRecord::Base.transaction do
|
244
|
+
snp.alleles.each do |allele|
|
245
|
+
# puts allele.inspect
|
246
|
+
allele.genotypes.each do |genotype|
|
247
|
+
#push bases to hash
|
248
|
+
sequence_hash[genotype.strain_id] << allele.base
|
249
|
+
end
|
250
|
+
end
|
251
|
+
end
|
252
|
+
|
65
253
|
|
66
|
-
#
|
67
|
-
|
254
|
+
#generate FASTA file
|
255
|
+
strains.each do |strain|
|
256
|
+
output.print ">#{strain.name}\n" , sequence_hash[strain.id].join("")
|
257
|
+
output.puts
|
258
|
+
end
|
68
259
|
|
260
|
+
if opts[:tree]
|
261
|
+
`FastTree -fastest -nt #{opts[:output]} > #{opts[:w]}`
|
262
|
+
end
|
263
|
+
end
|
264
|
+
|
265
|
+
else
|
266
|
+
puts opts.help
|
267
|
+
end
|
data/lib/snp-search.rb
CHANGED
@@ -166,3 +166,14 @@ puts "Adding SNPs........"
|
|
166
166
|
snp.save
|
167
167
|
end
|
168
168
|
end
|
169
|
+
|
170
|
+
def find_shared_snps(strain_names)
|
171
|
+
*strain_names = strain_names
|
172
|
+
|
173
|
+
where_statement = strain_names.collect{|strain_name| "strains.name = '#{strain_name}' OR "}.join("").sub(/ OR $/, "")
|
174
|
+
|
175
|
+
puts "Snp.find_by_sql(\"SELECT * from snps INNER JOIN alleles ON alleles.snp_id = snps.id INNER JOIN genotypes ON alleles.id = genotypes.allele_id INNER JOIN strains ON strains.id = genotypes.strain_id WHERE (#{where_statement}) AND alleles.id <> snps.reference_allele_id AND (SELECT COUNT(*) from snps AS s INNER JOIN alleles ON alleles.snp_id = snps.id INNER JOIN genotypes ON alleles.id = genotypes.allele_id WHERE alleles.id <> snps.reference_allele_id and s.id = snps.id) = #{strain_names.size} GROUP BY snps.id HAVING COUNT(*) = #{strain_names.size}\")"
|
176
|
+
end
|
177
|
+
|
178
|
+
|
179
|
+
|
data/lib/snp_db_connection.rb
CHANGED
data/snp-search.gemspec
CHANGED
@@ -5,11 +5,11 @@
|
|
5
5
|
|
6
6
|
Gem::Specification.new do |s|
|
7
7
|
s.name = "snp-search"
|
8
|
-
s.version = "0.
|
8
|
+
s.version = "1.0.0"
|
9
9
|
|
10
10
|
s.required_rubygems_version = Gem::Requirement.new(">= 0") if s.respond_to? :required_rubygems_version=
|
11
11
|
s.authors = ["Ali Al-Shahib", "Anthony Underwood"]
|
12
|
-
s.date = "2012-
|
12
|
+
s.date = "2012-05-10"
|
13
13
|
s.description = "Use the snp-search tool to create, import, manipulate and query your SNP database"
|
14
14
|
s.email = "ali.al-shahib@hpa.org.uk"
|
15
15
|
s.executables = ["snp-search"]
|
@@ -28,9 +28,6 @@ Gem::Specification.new do |s|
|
|
28
28
|
"Rakefile",
|
29
29
|
"VERSION",
|
30
30
|
"bin/snp-search",
|
31
|
-
"examples/example1.rb",
|
32
|
-
"examples/example2.rb",
|
33
|
-
"examples/snp_db_models.rb",
|
34
31
|
"lib/snp-search.rb",
|
35
32
|
"lib/snp_db_connection.rb",
|
36
33
|
"lib/snp_db_models.rb",
|
metadata
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
--- !ruby/object:Gem::Specification
|
2
2
|
name: snp-search
|
3
3
|
version: !ruby/object:Gem::Version
|
4
|
-
version: 0.
|
4
|
+
version: 1.0.0
|
5
5
|
prerelease:
|
6
6
|
platform: ruby
|
7
7
|
authors:
|
@@ -10,11 +10,11 @@ authors:
|
|
10
10
|
autorequire:
|
11
11
|
bindir: bin
|
12
12
|
cert_chain: []
|
13
|
-
date: 2012-
|
13
|
+
date: 2012-05-10 00:00:00.000000000Z
|
14
14
|
dependencies:
|
15
15
|
- !ruby/object:Gem::Dependency
|
16
16
|
name: activerecord
|
17
|
-
requirement: &
|
17
|
+
requirement: &2165230340 !ruby/object:Gem::Requirement
|
18
18
|
none: false
|
19
19
|
requirements:
|
20
20
|
- - ~>
|
@@ -22,10 +22,10 @@ dependencies:
|
|
22
22
|
version: 3.1.3
|
23
23
|
type: :runtime
|
24
24
|
prerelease: false
|
25
|
-
version_requirements: *
|
25
|
+
version_requirements: *2165230340
|
26
26
|
- !ruby/object:Gem::Dependency
|
27
27
|
name: bio
|
28
|
-
requirement: &
|
28
|
+
requirement: &2165229420 !ruby/object:Gem::Requirement
|
29
29
|
none: false
|
30
30
|
requirements:
|
31
31
|
- - ~>
|
@@ -33,10 +33,10 @@ dependencies:
|
|
33
33
|
version: 1.4.2
|
34
34
|
type: :runtime
|
35
35
|
prerelease: false
|
36
|
-
version_requirements: *
|
36
|
+
version_requirements: *2165229420
|
37
37
|
- !ruby/object:Gem::Dependency
|
38
38
|
name: slop
|
39
|
-
requirement: &
|
39
|
+
requirement: &2165228320 !ruby/object:Gem::Requirement
|
40
40
|
none: false
|
41
41
|
requirements:
|
42
42
|
- - ~>
|
@@ -44,10 +44,10 @@ dependencies:
|
|
44
44
|
version: 2.4.0
|
45
45
|
type: :runtime
|
46
46
|
prerelease: false
|
47
|
-
version_requirements: *
|
47
|
+
version_requirements: *2165228320
|
48
48
|
- !ruby/object:Gem::Dependency
|
49
49
|
name: sqlite3
|
50
|
-
requirement: &
|
50
|
+
requirement: &2165227400 !ruby/object:Gem::Requirement
|
51
51
|
none: false
|
52
52
|
requirements:
|
53
53
|
- - ~>
|
@@ -55,10 +55,10 @@ dependencies:
|
|
55
55
|
version: 1.3.4
|
56
56
|
type: :runtime
|
57
57
|
prerelease: false
|
58
|
-
version_requirements: *
|
58
|
+
version_requirements: *2165227400
|
59
59
|
- !ruby/object:Gem::Dependency
|
60
60
|
name: activerecord-import
|
61
|
-
requirement: &
|
61
|
+
requirement: &2165226380 !ruby/object:Gem::Requirement
|
62
62
|
none: false
|
63
63
|
requirements:
|
64
64
|
- - ~>
|
@@ -66,10 +66,10 @@ dependencies:
|
|
66
66
|
version: 0.2.8
|
67
67
|
type: :runtime
|
68
68
|
prerelease: false
|
69
|
-
version_requirements: *
|
69
|
+
version_requirements: *2165226380
|
70
70
|
- !ruby/object:Gem::Dependency
|
71
71
|
name: rspec
|
72
|
-
requirement: &
|
72
|
+
requirement: &2165225400 !ruby/object:Gem::Requirement
|
73
73
|
none: false
|
74
74
|
requirements:
|
75
75
|
- - ~>
|
@@ -77,10 +77,10 @@ dependencies:
|
|
77
77
|
version: 2.3.0
|
78
78
|
type: :development
|
79
79
|
prerelease: false
|
80
|
-
version_requirements: *
|
80
|
+
version_requirements: *2165225400
|
81
81
|
- !ruby/object:Gem::Dependency
|
82
82
|
name: bundler
|
83
|
-
requirement: &
|
83
|
+
requirement: &2165224600 !ruby/object:Gem::Requirement
|
84
84
|
none: false
|
85
85
|
requirements:
|
86
86
|
- - ~>
|
@@ -88,10 +88,10 @@ dependencies:
|
|
88
88
|
version: 1.0.0
|
89
89
|
type: :development
|
90
90
|
prerelease: false
|
91
|
-
version_requirements: *
|
91
|
+
version_requirements: *2165224600
|
92
92
|
- !ruby/object:Gem::Dependency
|
93
93
|
name: jeweler
|
94
|
-
requirement: &
|
94
|
+
requirement: &2165223220 !ruby/object:Gem::Requirement
|
95
95
|
none: false
|
96
96
|
requirements:
|
97
97
|
- - ~>
|
@@ -99,10 +99,10 @@ dependencies:
|
|
99
99
|
version: 1.6.4
|
100
100
|
type: :development
|
101
101
|
prerelease: false
|
102
|
-
version_requirements: *
|
102
|
+
version_requirements: *2165223220
|
103
103
|
- !ruby/object:Gem::Dependency
|
104
104
|
name: rcov
|
105
|
-
requirement: &
|
105
|
+
requirement: &2165222000 !ruby/object:Gem::Requirement
|
106
106
|
none: false
|
107
107
|
requirements:
|
108
108
|
- - ! '>='
|
@@ -110,7 +110,7 @@ dependencies:
|
|
110
110
|
version: '0'
|
111
111
|
type: :development
|
112
112
|
prerelease: false
|
113
|
-
version_requirements: *
|
113
|
+
version_requirements: *2165222000
|
114
114
|
description: Use the snp-search tool to create, import, manipulate and query your
|
115
115
|
SNP database
|
116
116
|
email: ali.al-shahib@hpa.org.uk
|
@@ -131,9 +131,6 @@ files:
|
|
131
131
|
- Rakefile
|
132
132
|
- VERSION
|
133
133
|
- bin/snp-search
|
134
|
-
- examples/example1.rb
|
135
|
-
- examples/example2.rb
|
136
|
-
- examples/snp_db_models.rb
|
137
134
|
- lib/snp-search.rb
|
138
135
|
- lib/snp_db_connection.rb
|
139
136
|
- lib/snp_db_models.rb
|
@@ -156,7 +153,7 @@ required_ruby_version: !ruby/object:Gem::Requirement
|
|
156
153
|
version: '0'
|
157
154
|
segments:
|
158
155
|
- 0
|
159
|
-
hash:
|
156
|
+
hash: 1630410471760364863
|
160
157
|
required_rubygems_version: !ruby/object:Gem::Requirement
|
161
158
|
none: false
|
162
159
|
requirements:
|
data/examples/example1.rb
DELETED
@@ -1,92 +0,0 @@
|
|
1
|
-
# This query script removes the 'phage' genes from the database.
|
2
|
-
# Only use this script once your database has been fully populated.
|
3
|
-
# Usage: ruby example1.rb -d your_db_name.sqlite3 -s list_of_your_species.txt -o output.fasta
|
4
|
-
# You may use this script to do other SQL queries that result in a fasta output. Just change the 'snps' SQL query below with your query.
|
5
|
-
require 'snp_db_models'
|
6
|
-
gem "slop", "~> 2.4.0"
|
7
|
-
require 'slop'
|
8
|
-
|
9
|
-
opts = Slop.new :help do
|
10
|
-
banner "ruby query.rb [OPTIONS]"
|
11
|
-
|
12
|
-
on :V, :verbose, 'Enable verbose mode'
|
13
|
-
on :D, :database=, 'The name of the database you like to query', true
|
14
|
-
on :o, :outfile=, 'output file, in fasta format', true
|
15
|
-
on :s, :strain=, 'The strains/samples you like to query', true
|
16
|
-
on :a, :annotation=, 'The gene you like to remove from analysis', true
|
17
|
-
|
18
|
-
on_empty do
|
19
|
-
puts help
|
20
|
-
end
|
21
|
-
end
|
22
|
-
opts.parse
|
23
|
-
|
24
|
-
puts "You must supply the -s option, it's a required field" and exit unless opts[:strain]
|
25
|
-
puts "You must supply the -D option, it's a required field" and exit unless opts[:database]
|
26
|
-
|
27
|
-
begin
|
28
|
-
puts "#{opts[:database]} file does not exist!" and exit unless File.exist?(opts[:database])
|
29
|
-
rescue
|
30
|
-
end
|
31
|
-
|
32
|
-
begin
|
33
|
-
puts "#{opts[:strain]} file does not exist!" and exit unless File.exist?(opts[:strain])
|
34
|
-
rescue
|
35
|
-
end
|
36
|
-
|
37
|
-
annotation = opts[:annotation]
|
38
|
-
establish_connection(opts[:database])
|
39
|
-
|
40
|
-
begin
|
41
|
-
strains = []
|
42
|
-
File.read(opts[:strain]).each_line do |line|
|
43
|
-
strains << line.chop
|
44
|
-
end
|
45
|
-
|
46
|
-
# Enter the name of your database
|
47
|
-
|
48
|
-
outfile = File.open(opts[:outfile], "w")
|
49
|
-
|
50
|
-
|
51
|
-
# create a sequence hash
|
52
|
-
sequence_hash = Hash.new
|
53
|
-
|
54
|
-
# create an array of strains
|
55
|
-
|
56
|
-
# hash key is strain_name, loop through strain_names
|
57
|
-
# create an empty array
|
58
|
-
strains.each do |strain_name|
|
59
|
-
sequence_hash[strain_name] = Array.new
|
60
|
-
end
|
61
|
-
|
62
|
-
snps = Snp.find_by_sql("SELECT snps.* FROM snps
|
63
|
-
INNER JOIN features
|
64
|
-
ON features.id = snps.feature_id
|
65
|
-
WHERE features.id IN
|
66
|
-
(select features.id from features
|
67
|
-
WHERE id NOT IN
|
68
|
-
(select distinct features.id FROM features
|
69
|
-
INNER JOIN annotations ON
|
70
|
-
annotations.feature_id = features.id
|
71
|
-
WHERE annotations.value LIKE '%(#{annotation})%'))")
|
72
|
-
|
73
|
-
|
74
|
-
#puts snps.size
|
75
|
-
puts "Your Query is submitted and is being processed......."
|
76
|
-
snps.each do |snp|
|
77
|
-
#break if i == 100
|
78
|
-
snp.alleles.each do |allele|
|
79
|
-
allele.genotypes.each do |genotype|
|
80
|
-
# puts genotype.inspect
|
81
|
-
sequence_hash[genotype.strain.name] << allele.base
|
82
|
-
end
|
83
|
-
end
|
84
|
-
end
|
85
|
-
|
86
|
-
strains.each do |sn|
|
87
|
-
outfile.print ">#{sn}\n" , sequence_hash[sn].join("")
|
88
|
-
outfile.puts
|
89
|
-
end
|
90
|
-
|
91
|
-
rescue
|
92
|
-
end
|
data/examples/example2.rb
DELETED
@@ -1,61 +0,0 @@
|
|
1
|
-
# This query script finds the unique snps amongs the list of strains provided.
|
2
|
-
# Only use this script once your database has been fully populated.
|
3
|
-
# Usage: ruby example2.rb -d your_db_name.sqlite3 -s list_of_your_species.txt
|
4
|
-
# Output is the number of unique snps in the list of your strains provided in the -s option.
|
5
|
-
# You may use this script to do other SQL queries. Just change the SQL query below with your query.
|
6
|
-
|
7
|
-
require 'snp_db_models'
|
8
|
-
gem "slop", "~> 2.4.0"
|
9
|
-
require 'slop'
|
10
|
-
|
11
|
-
opts = Slop.new :help do
|
12
|
-
banner "ruby query.rb [OPTIONS]"
|
13
|
-
|
14
|
-
on :V, :verbose, 'Enable verbose mode'
|
15
|
-
on :D, :database=, 'The name of the database you like to query', true
|
16
|
-
on :s, :strain=, 'The strains/samples you like to query', true
|
17
|
-
|
18
|
-
on_empty do
|
19
|
-
puts help
|
20
|
-
end
|
21
|
-
end
|
22
|
-
opts.parse
|
23
|
-
|
24
|
-
puts "You must supply the -D option, it's a required field" and exit unless opts[:database]
|
25
|
-
puts "You must supply the -s option, it's a required field" and exit unless opts[:strain]
|
26
|
-
|
27
|
-
begin
|
28
|
-
puts "#{opts[:database]} file does not exist!" and exit unless File.exist?(opts[:database])
|
29
|
-
rescue
|
30
|
-
end
|
31
|
-
|
32
|
-
begin
|
33
|
-
puts "#{opts[:strain]} file does not exist!" and exit unless File.exist?(opts[:strain])
|
34
|
-
rescue
|
35
|
-
end
|
36
|
-
|
37
|
-
|
38
|
-
establish_connection(opts[:database])
|
39
|
-
|
40
|
-
begin
|
41
|
-
strains = []
|
42
|
-
File.read(opts[:strain]).each_line do |line|
|
43
|
-
strains << line.chop
|
44
|
-
end
|
45
|
-
|
46
|
-
def find_shared_snps(strain_names)
|
47
|
-
*strain_names = strain_names
|
48
|
-
|
49
|
-
where_statement = strain_names.collect{|strain_name| "strains.name = '#{strain_name}' OR "}.join("").sub(/ OR $/, "")
|
50
|
-
|
51
|
-
return Snp.find_by_sql("SELECT * FROM (SELECT features.* from features INNER JOIN snps ON features.id = snps.feature_id INNER JOIN alleles ON alleles.snp_id = snps.id INNER JOIN genotypes ON alleles.id = genotypes.allele_id INNER JOIN strains ON strains.id = genotypes.strain_id WHERE (#{where_statement}) AND alleles.id <> snps.reference_allele_id AND (SELECT COUNT(*) from snps AS s INNER JOIN alleles ON alleles.snp_id = snps.id INNER JOIN genotypes ON alleles.id = genotypes.allele_id WHERE alleles.id <> snps.reference_allele_id and s.id = snps.id) = #{strain_names.size} GROUP BY snps.id HAVING COUNT(*) = #{strain_names.size})");
|
52
|
-
end
|
53
|
-
|
54
|
-
gas_snps = find_shared_snps(strains)
|
55
|
-
|
56
|
-
gas_snps.each do |snp|
|
57
|
-
puts "The number of unique snps are #{snp.id}"
|
58
|
-
end
|
59
|
-
|
60
|
-
rescue
|
61
|
-
end
|
data/examples/snp_db_models.rb
DELETED
@@ -1,32 +0,0 @@
|
|
1
|
-
require 'snp_db_connection'
|
2
|
-
|
3
|
-
class Strain < ActiveRecord::Base
|
4
|
-
has_many :alleles, :through => :genotypes
|
5
|
-
has_many :genotypes
|
6
|
-
end
|
7
|
-
|
8
|
-
class Feature < ActiveRecord::Base
|
9
|
-
has_many :annotations
|
10
|
-
has_many :snps
|
11
|
-
end
|
12
|
-
|
13
|
-
class Snp < ActiveRecord::Base
|
14
|
-
belongs_to :feature
|
15
|
-
has_many :alleles
|
16
|
-
belongs_to :reference_allele, :class_name => "Allele", :foreign_key => "reference_allele_id"
|
17
|
-
end
|
18
|
-
|
19
|
-
class Allele < ActiveRecord::Base
|
20
|
-
has_many :genotypes
|
21
|
-
belongs_to :snp
|
22
|
-
has_many :strains, :through => :genotypes
|
23
|
-
end
|
24
|
-
|
25
|
-
class Genotype < ActiveRecord::Base
|
26
|
-
belongs_to :allele
|
27
|
-
belongs_to :strain
|
28
|
-
end
|
29
|
-
|
30
|
-
class Annotation < ActiveRecord::Base
|
31
|
-
belongs_to :feature
|
32
|
-
end
|