snp-search 0.27.2 → 0.29.0

Sign up to get free protection for your applications and to get access to all the features.
data/README.rdoc CHANGED
@@ -23,20 +23,20 @@ To run snp-search, you need to have 2 files:
23
23
 
24
24
  1- Variant Call Format (.vcf) file (which contains the SNP information)
25
25
 
26
- 2- Your reference genome that you used to generate your .vcf file (in genbank or embl format, the script will automatically detect the format).
26
+ 2- Your database reference genome that you used to generate your .vcf file (in genbank or embl format, the script will automatically detect the format).
27
27
 
28
28
  Once you have these files ready, you may run snp-search with the following options:
29
29
 
30
30
  -V Enable verbose mode
31
- -n Name of your database Optional, default = snp_db.sqlite3
31
+ -n Name of your database
32
32
  -v .vcf file Required
33
- -r Reference genome file (The same file that was used in generating the .vcf file). This should be in genbank or embl format. Required
33
+ -d Database Reference genome (The same file that was used in generating the .vcf file). This should be in genbank or embl format. Required
34
34
  -c SNP quality score cutoff. A Phred-scaled quality score. High quality scores indicate high confidence calls. Optional, default = 90 (out of 100)
35
35
  -t Genotype Quality score cutoff. Phred-scaled quality score that the genotype is true. Optional, default = 30
36
36
  -h help message
37
37
 
38
38
  Usage:
39
- snp-search -n my_snp_db.sqlite3 -r my_ref.gbk -v my_vcf_file.vcf
39
+ snp-search -n my_snp_db.sqlite3 -d my_ref.gbk -v my_vcf_file.vcf
40
40
 
41
41
  == Output
42
42
  The output is your database in sqlite3 format. If you like to view your table(s) and perform queries you can type
@@ -52,13 +52,13 @@ We have included two example queries that you may find useful:
52
52
 
53
53
  Usage:
54
54
 
55
- ruby example1.rb -d your_db_name.sqlite3 -s list_of_your_species.txt -o output.fasta
55
+ ruby example1.rb -D your_db_name.sqlite3 -s list_of_your_species.txt -o output.fasta
56
56
 
57
57
  * Example2: This script queries the database and selects the number of unique SNPs within the list of the strains/samples provided. The output is the number of unique SNPs.
58
58
 
59
59
  Usage:
60
60
 
61
- ruby example2.rb -d your_db_name.sqlite3 -s list_of_your_species.txt
61
+ ruby example2.rb -D your_db_name.sqlite3 -s list_of_your_species.txt
62
62
 
63
63
 
64
64
  == Contact
data/VERSION CHANGED
@@ -1 +1 @@
1
- 0.27.2
1
+ 0.29.0
data/bin/snp-search CHANGED
@@ -6,7 +6,6 @@ require 'snp_db_schema'
6
6
  gem "slop", "~> 2.4.0"
7
7
  require 'slop'
8
8
 
9
- begin
10
9
  opts = Slop.new :help do
11
10
  banner "ruby snp-search [OPTIONS]"
12
11
 
@@ -32,22 +31,16 @@ opts.parse
32
31
  exit
33
32
  end
34
33
 
35
- begin
36
34
  abort "#{opts[:reference_file]} file does not exist!" unless File.exist?(opts[:reference_file])
37
- rescue
38
- end
39
35
 
40
- begin
41
36
  abort "#{opts[:vcf_file]} file does not exist!" unless File.exist?(opts[:vcf_file])
42
- rescue
43
- end
44
37
 
45
38
 
46
- # Enter the name of your database
39
+ # Name of your database
47
40
  establish_connection(opts[:name])
48
41
 
49
42
  # Schema will run here
50
- #db_schema
43
+ db_schema
51
44
 
52
45
  ref = opts[:reference_file]
53
46
 
@@ -68,10 +61,8 @@ sequence_format = guess_sequence_format(ref)
68
61
  vcf_mpileup_file = opts[:vcf_file]
69
62
 
70
63
  # The populate_features_and_annotations method populates the features and annotations. It uses the embl/gbk file.
71
- populate_features_and_annotations(sequence_flatfile)
64
+ # populate_features_and_annotations(sequence_flatfile)
72
65
 
73
66
  #The populate_snps_alleles_genotypes method populates the snps, alleles and genotypes. It uses the vcf file, and if specified, the SNP quality cutoff and genotype quality cutoff
74
67
  populate_snps_alleles_genotypes(vcf_mpileup_file, opts[:cuttoff_snp].to_i, opts[:cuttoff_genotype].to_i)
75
68
 
76
- rescue
77
- end
data/examples/example2.rb CHANGED
@@ -12,7 +12,7 @@ opts = Slop.new :help do
12
12
  banner "ruby query.rb [OPTIONS]"
13
13
 
14
14
  on :V, :verbose, 'Enable verbose mode'
15
- on :d, :database=, 'The name of the database you like to query', true
15
+ on :D, :database=, 'The name of the database you like to query', true
16
16
  on :s, :strain=, 'The strains/samples you like to query', true
17
17
 
18
18
  on_empty do
@@ -21,7 +21,7 @@ opts = Slop.new :help do
21
21
  end
22
22
  opts.parse
23
23
 
24
- puts "You must supply the -d option, it's a required field" and exit unless opts[:database]
24
+ puts "You must supply the -D option, it's a required field" and exit unless opts[:database]
25
25
  puts "You must supply the -s option, it's a required field" and exit unless opts[:strain]
26
26
 
27
27
  begin
data/lib/snp-search.rb CHANGED
@@ -4,6 +4,7 @@ require 'bio'
4
4
  require 'snp_db_models'
5
5
  require 'activerecord-import'
6
6
 
7
+ #This method guesses the reference sequence file format
7
8
  def guess_sequence_format(reference_genome)
8
9
  file_extension = File.extname(reference_genome).downcase
9
10
  file_format = nil
@@ -16,10 +17,10 @@ def guess_sequence_format(reference_genome)
16
17
  return file_format
17
18
  end
18
19
 
19
- # A method to populate the database with the features (genes etc) and the annotations from the embl file.
20
+ # A method to populate the database with the features (genes etc) and the annotations from the gbk/embl file.
20
21
  # We include all features that are not 'source' or 'gene' as they are repetitive info. 'CDS' is the gene.
21
22
  # The annotation table includes also the start and end coordinates of the CDS. The strand is also included. the 'locations' method is defined in bioruby under genbank. It must be required at the top (bio).
22
- #Also, the qualifier and value are extracted from the embl file and added to the database.
23
+ #Also, the qualifier and value are extracted from the gbk/embl file and added to the database.
23
24
  def populate_features_and_annotations(sequence_file)
24
25
  puts "Adding features and their annotations...."
25
26
  ActiveRecord::Base.transaction do
@@ -48,19 +49,16 @@ def populate_features_and_annotations(sequence_file)
48
49
  end
49
50
 
50
51
  #This method populates the rest of the information, i.e. SNP information, Alleles and Genotypes.
51
- # It requires the strain_names as array and the output (vcf file) from mpileup-snp identification algorithm.
52
52
  def populate_snps_alleles_genotypes(vcf_file, cuttoff_snp, cuttoff_genotype)
53
53
  puts "Adding SNPs........"
54
54
  # open vcf file and parse each line
55
55
  File.open(vcf_file) do |f|
56
56
  # header names
57
- while line = f.gets
57
+ while line = f.gets.chomp!
58
58
  if line =~ /CHROM/
59
- #puts line
60
59
  column_headings = line.split("\t")
61
60
  strain_names = column_headings[9..-1]
62
61
  strain_names.map!{|name| name.sub(/\..*/, '')}
63
- #puts strain_names
64
62
 
65
63
  strain_names.each do |str|
66
64
  ss = Strain.new
@@ -116,8 +114,8 @@ puts "Adding SNPs........"
116
114
  s = Snp.new
117
115
  s.ref_pos = ref_pos
118
116
  s.save
119
-
120
- # create ref allele
117
+
118
+ # create ref allele
121
119
  ref_allele = Allele.new
122
120
  ref_allele.base = ref_base
123
121
  ref_allele.snp = s
@@ -132,14 +130,11 @@ puts "Adding SNPs........"
132
130
  snp_allele.snp = s
133
131
  snp_allele.save
134
132
 
135
- a = Time.now
136
- # geno = [:ref_allele, :snp_allele]
137
133
  ActiveRecord::Base.transaction do
138
134
  genotypes.each_with_index do |gt, index|
139
135
  genotype = Genotype.new
140
136
  genotype.strain = strains[index]
141
137
  puts index if strains[index].nil?
142
- # print "#{gt}(#{genotypes_qualities[index]}) "
143
138
  if gt == "0/0" # wild type
144
139
  genotype.allele = ref_allele
145
140
  elsif gt == "1/1" # snp type
data/snp-search.gemspec CHANGED
@@ -5,11 +5,11 @@
5
5
 
6
6
  Gem::Specification.new do |s|
7
7
  s.name = "snp-search"
8
- s.version = "0.27.2"
8
+ s.version = "0.29.0"
9
9
 
10
10
  s.required_rubygems_version = Gem::Requirement.new(">= 0") if s.respond_to? :required_rubygems_version=
11
11
  s.authors = ["Ali Al-Shahib", "Anthony Underwood"]
12
- s.date = "2011-12-16"
12
+ s.date = "2012-01-05"
13
13
  s.description = "Use the snp-search toolset to query the SNP database"
14
14
  s.email = "ali.al-shahib@hpa.org.uk"
15
15
  s.executables = ["snp-search"]
@@ -28,10 +28,8 @@ Gem::Specification.new do |s|
28
28
  "Rakefile",
29
29
  "VERSION",
30
30
  "bin/snp-search",
31
- "examples/ali.txt",
32
31
  "examples/example1.rb",
33
32
  "examples/example2.rb",
34
- "examples/list_of_GAS_strains.txt",
35
33
  "examples/snp_db_models.rb",
36
34
  "lib/snp-search.rb",
37
35
  "lib/snp_db_connection.rb",
metadata CHANGED
@@ -1,7 +1,7 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: snp-search
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.27.2
4
+ version: 0.29.0
5
5
  prerelease:
6
6
  platform: ruby
7
7
  authors:
@@ -10,11 +10,11 @@ authors:
10
10
  autorequire:
11
11
  bindir: bin
12
12
  cert_chain: []
13
- date: 2011-12-16 00:00:00.000000000Z
13
+ date: 2012-01-05 00:00:00.000000000Z
14
14
  dependencies:
15
15
  - !ruby/object:Gem::Dependency
16
16
  name: activerecord
17
- requirement: &2153055400 !ruby/object:Gem::Requirement
17
+ requirement: &2173599960 !ruby/object:Gem::Requirement
18
18
  none: false
19
19
  requirements:
20
20
  - - ~>
@@ -22,10 +22,10 @@ dependencies:
22
22
  version: 3.1.3
23
23
  type: :runtime
24
24
  prerelease: false
25
- version_requirements: *2153055400
25
+ version_requirements: *2173599960
26
26
  - !ruby/object:Gem::Dependency
27
27
  name: bio
28
- requirement: &2153054800 !ruby/object:Gem::Requirement
28
+ requirement: &2173599480 !ruby/object:Gem::Requirement
29
29
  none: false
30
30
  requirements:
31
31
  - - ~>
@@ -33,10 +33,10 @@ dependencies:
33
33
  version: 1.4.2
34
34
  type: :runtime
35
35
  prerelease: false
36
- version_requirements: *2153054800
36
+ version_requirements: *2173599480
37
37
  - !ruby/object:Gem::Dependency
38
38
  name: slop
39
- requirement: &2153054200 !ruby/object:Gem::Requirement
39
+ requirement: &2173599000 !ruby/object:Gem::Requirement
40
40
  none: false
41
41
  requirements:
42
42
  - - ~>
@@ -44,10 +44,10 @@ dependencies:
44
44
  version: 2.4.0
45
45
  type: :runtime
46
46
  prerelease: false
47
- version_requirements: *2153054200
47
+ version_requirements: *2173599000
48
48
  - !ruby/object:Gem::Dependency
49
49
  name: sqlite3
50
- requirement: &2153053500 !ruby/object:Gem::Requirement
50
+ requirement: &2173598520 !ruby/object:Gem::Requirement
51
51
  none: false
52
52
  requirements:
53
53
  - - ~>
@@ -55,10 +55,10 @@ dependencies:
55
55
  version: 1.3.4
56
56
  type: :runtime
57
57
  prerelease: false
58
- version_requirements: *2153053500
58
+ version_requirements: *2173598520
59
59
  - !ruby/object:Gem::Dependency
60
60
  name: activerecord-import
61
- requirement: &2153052900 !ruby/object:Gem::Requirement
61
+ requirement: &2173598040 !ruby/object:Gem::Requirement
62
62
  none: false
63
63
  requirements:
64
64
  - - ~>
@@ -66,10 +66,10 @@ dependencies:
66
66
  version: 0.2.8
67
67
  type: :runtime
68
68
  prerelease: false
69
- version_requirements: *2153052900
69
+ version_requirements: *2173598040
70
70
  - !ruby/object:Gem::Dependency
71
71
  name: rspec
72
- requirement: &2153052360 !ruby/object:Gem::Requirement
72
+ requirement: &2173597560 !ruby/object:Gem::Requirement
73
73
  none: false
74
74
  requirements:
75
75
  - - ~>
@@ -77,10 +77,10 @@ dependencies:
77
77
  version: 2.3.0
78
78
  type: :development
79
79
  prerelease: false
80
- version_requirements: *2153052360
80
+ version_requirements: *2173597560
81
81
  - !ruby/object:Gem::Dependency
82
82
  name: bundler
83
- requirement: &2153051780 !ruby/object:Gem::Requirement
83
+ requirement: &2173597080 !ruby/object:Gem::Requirement
84
84
  none: false
85
85
  requirements:
86
86
  - - ~>
@@ -88,10 +88,10 @@ dependencies:
88
88
  version: 1.0.0
89
89
  type: :development
90
90
  prerelease: false
91
- version_requirements: *2153051780
91
+ version_requirements: *2173597080
92
92
  - !ruby/object:Gem::Dependency
93
93
  name: jeweler
94
- requirement: &2153051180 !ruby/object:Gem::Requirement
94
+ requirement: &2173596600 !ruby/object:Gem::Requirement
95
95
  none: false
96
96
  requirements:
97
97
  - - ~>
@@ -99,10 +99,10 @@ dependencies:
99
99
  version: 1.6.4
100
100
  type: :development
101
101
  prerelease: false
102
- version_requirements: *2153051180
102
+ version_requirements: *2173596600
103
103
  - !ruby/object:Gem::Dependency
104
104
  name: rcov
105
- requirement: &2153050660 !ruby/object:Gem::Requirement
105
+ requirement: &2173596100 !ruby/object:Gem::Requirement
106
106
  none: false
107
107
  requirements:
108
108
  - - ! '>='
@@ -110,7 +110,7 @@ dependencies:
110
110
  version: '0'
111
111
  type: :development
112
112
  prerelease: false
113
- version_requirements: *2153050660
113
+ version_requirements: *2173596100
114
114
  description: Use the snp-search toolset to query the SNP database
115
115
  email: ali.al-shahib@hpa.org.uk
116
116
  executables:
@@ -130,10 +130,8 @@ files:
130
130
  - Rakefile
131
131
  - VERSION
132
132
  - bin/snp-search
133
- - examples/ali.txt
134
133
  - examples/example1.rb
135
134
  - examples/example2.rb
136
- - examples/list_of_GAS_strains.txt
137
135
  - examples/snp_db_models.rb
138
136
  - lib/snp-search.rb
139
137
  - lib/snp_db_connection.rb
@@ -157,7 +155,7 @@ required_ruby_version: !ruby/object:Gem::Requirement
157
155
  version: '0'
158
156
  segments:
159
157
  - 0
160
- hash: -4131352936565769320
158
+ hash: 3565475857465083305
161
159
  required_rubygems_version: !ruby/object:Gem::Requirement
162
160
  none: false
163
161
  requirements:
data/examples/ali.txt DELETED
File without changes
@@ -1,212 +0,0 @@
1
- H041200152
2
- H041260144
3
- H041320325
4
- H041380342
5
- H041520010
6
- H041620019
7
- H041680416
8
- H041740036
9
- H041860019
10
- H042140018
11
- H040300050
12
- H042220216
13
- H041400347
14
- H041460313
15
- H042320017
16
- H040360231
17
- H040640029
18
- H040660409
19
- H040680243
20
- H040700032
21
- H040960438
22
- H041080566
23
- H041120010
24
- H044140024
25
- H044300140
26
- H044600024
27
- H044760104
28
- H045220067
29
- H045220068
30
- H050200647
31
- H050620079
32
- H080540148
33
- H080580153
34
- H042380354
35
- H080680108
36
- H080700014
37
- H080920125
38
- H081100032
39
- H042560017
40
- H042660021
41
- H042880341
42
- H043080575
43
- H043280056
44
- H043820282
45
- H043920025
46
- H044020657
47
- H081880240
48
- H081940295
49
- H081980138
50
- H082060034
51
- H082140058
52
- H082160009
53
- H082160010
54
- H082240367
55
- H082320060
56
- H082340085
57
- H081120076
58
- H082340087
59
- H082400032
60
- H082420086
61
- H081180049
62
- H081200095
63
- H081220201
64
- H081380045
65
- H081420209
66
- H081520265
67
- H081520277
68
- H081860326
69
- H085140304
70
- H085180220
71
- H085180222
72
- H090140403
73
- H090140410
74
- H090220294
75
- H090240476
76
- H090340417
77
- H090360446
78
- H090400355
79
- H082500121
80
- H090460251
81
- H090480157
82
- H090500230
83
- H082660048
84
- H082780120
85
- H082800043
86
- H082800044
87
- H082980048
88
- H083140517
89
- H083200136
90
- H084740200
91
- H091280140
92
- H091340198
93
- H091300415
94
- H091360124
95
- H091380126
96
- H091540320
97
- H091680457
98
- H091740414
99
- H091760233
100
- H091760238
101
- H090580174
102
- H091960112
103
- H091960708
104
- H091980013
105
- H090640283
106
- H090700200
107
- H090780401
108
- H090800100
109
- H090920456
110
- H091200143
111
- H091220182
112
- H091220183
113
- H090480158
114
- H090580172
115
- H090600241
116
- H090600242
117
- H090640289
118
- H090740100
119
- H090940181
120
- H090960223
121
- H091300411
122
- H091560205
123
- H092120149
124
- H091640755
125
- H091680438
126
- H091760235
127
- H092260222
128
- H092480114
129
- H092520139
130
- H092780164
131
- H093980210
132
- H095060138
133
- H095160155
134
- H090200297
135
- H094540354
136
- H094760078
137
- H095080492
138
- H095100188
139
- H095240140
140
- H100180477
141
- H095260498
142
- H094680245
143
- H094560504
144
- H094360202
145
- H091780500
146
- H094180239
147
- H094160182
148
- H093840091
149
- H091820284
150
- H092080099
151
- H093040584
152
- H093080338
153
- H093180215
154
- H093340223
155
- H093420437
156
- H093640539
157
- H093420432
158
- H093380123
159
- H093360238
160
- H093260335
161
- H093200228
162
- H093180214
163
- H093120266
164
- H093100592
165
- H093080653
166
- H093060566
167
- H093780119
168
- H092940206
169
- H092940205
170
- H092920374
171
- H093700446
172
- H093640534
173
- H093560718
174
- H093560202
175
- H093520331
176
- H093500353
177
- H093440288
178
- H093420433
179
- H092560200
180
- H092300123
181
- H092300122
182
- H092280040
183
- H092260221
184
- H091940171
185
- H090200284
186
- H091820288
187
- H091780492
188
- H091640750
189
- H092920373
190
- H090980124
191
- H090900473
192
- H090300377
193
- H092920369
194
- H092880154
195
- H092880153
196
- H092860554
197
- H092860549
198
- H092840305
199
- H092800528
200
- H092600318
201
- MGAS10270
202
- MGAS10394
203
- MGAS10750
204
- MGAS2096
205
- MGAS315
206
- MGAS5005
207
- MGAS6180
208
- MGAS8232
209
- MGAS9429
210
- Manfredo
211
- NZ131
212
- SSI