codon_table_parser 0.2.0
Sign up to get free protection for your applications and to get access to all the features.
- data/LICENSE +20 -0
- data/README.md +290 -0
- data/Rakefile +25 -0
- data/lib/codon_table_parser/version.rb +3 -0
- data/lib/codon_table_parser.rb +189 -0
- metadata +66 -0
data/LICENSE
ADDED
@@ -0,0 +1,20 @@
|
|
1
|
+
Copyright (c) 2011 Stefan Rohlfing
|
2
|
+
|
3
|
+
Permission is hereby granted, free of charge, to any person obtaining
|
4
|
+
a copy of this software and associated documentation files (the
|
5
|
+
"Software"), to deal in the Software without restriction, including
|
6
|
+
without limitation the rights to use, copy, modify, merge, publish,
|
7
|
+
distribute, sublicense, and/or sell copies of the Software, and to
|
8
|
+
permit persons to whom the Software is furnished to do so, subject to
|
9
|
+
the following conditions:
|
10
|
+
|
11
|
+
The above copyright notice and this permission notice shall be
|
12
|
+
included in all copies or substantial portions of the Software.
|
13
|
+
|
14
|
+
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
|
15
|
+
EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
|
16
|
+
MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
|
17
|
+
NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE
|
18
|
+
LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION
|
19
|
+
OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION
|
20
|
+
WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
|
data/README.md
ADDED
@@ -0,0 +1,290 @@
|
|
1
|
+
# CodonTableParser
|
2
|
+
|
3
|
+
Parses the [NCBI genetic code](ftp://ftp.ncbi.nih.gov/entrez/misc/data/gc.prt) table with a multiline Regex, generating hash maps of each species' name, start codons, stop codons and codon table.
|
4
|
+
The output can be easily customized and used to update the respective constants of BioRuby's [CodonTable](https://github.com/bioruby/bioruby/blob/master/lib/bio/data/codontable.rb) class whenever the original data changes.
|
5
|
+
|
6
|
+
## Installation
|
7
|
+
|
8
|
+
``` bash
|
9
|
+
gem install codon_table_parser
|
10
|
+
|
11
|
+
```
|
12
|
+
|
13
|
+
## Usage
|
14
|
+
|
15
|
+
Without any parameters, the genetic code file is downloaded directly from the NCBI web site
|
16
|
+
|
17
|
+
``` ruby
|
18
|
+
parser = CodonTableParser.new
|
19
|
+
```
|
20
|
+
|
21
|
+
Alternatively, the genetic code file can be loaded from a path
|
22
|
+
|
23
|
+
``` ruby
|
24
|
+
file = 'path/to/genetic_code.txt'
|
25
|
+
parser = CodonTableParser.new(file)
|
26
|
+
```
|
27
|
+
|
28
|
+
The first line of the file is read to determine if the content is correct. If not, an exception is thrown:
|
29
|
+
|
30
|
+
``` ruby
|
31
|
+
wrong_content = 'path/to/wrong_content.txt'
|
32
|
+
parser = CodonTableParser.new(wrong_content)
|
33
|
+
# Exception: This is not the NCBI genetic code table
|
34
|
+
```
|
35
|
+
|
36
|
+
### Instance Methods
|
37
|
+
|
38
|
+
The following instance methods are available:
|
39
|
+
|
40
|
+
* CodonTableParser#definitions
|
41
|
+
* CodonTableParser#starts
|
42
|
+
* CodonTableParser#stops
|
43
|
+
* CodonTableParser#tables
|
44
|
+
* CodonTableParser#bundle
|
45
|
+
|
46
|
+
Every intance method can take a *:range* option that specifies the ids of the species to be considered in the output.
|
47
|
+
A range is specified as an array of integers, Ranges or both.
|
48
|
+
Example:
|
49
|
+
|
50
|
+
``` ruby
|
51
|
+
:range => [(1..3), 5, 9] # converted internally to [1, 2, 3, 5, 9]
|
52
|
+
|
53
|
+
```
|
54
|
+
ids not present in the originial data are ignored.
|
55
|
+
Besides the *:range* option, several methods also take other options as demonstrated below.
|
56
|
+
|
57
|
+
#### CodonTableParser#definitions
|
58
|
+
|
59
|
+
``` ruby
|
60
|
+
|
61
|
+
parser = CodonTableParser.new
|
62
|
+
|
63
|
+
# Return default hash map of names
|
64
|
+
definitions = parser.definitions
|
65
|
+
|
66
|
+
definitions
|
67
|
+
# {1=>"Standard",
|
68
|
+
# 2=>"Vertebrate Mitochondrial",
|
69
|
+
# 3=>"Yeast Mitochondrial",
|
70
|
+
# 4=>"Mold Mitochondrial; Protozoan Mitochondrial; Coelenterate Mitochondrial; Mycoplasma; Spiroplasma",
|
71
|
+
# 5=>"Invertebrate Mitochondrial",
|
72
|
+
# 6=>"Ciliate Nuclear; Dasycladacean Nuclear; Hexamita Nuclear",
|
73
|
+
# 9=>"Echinoderm Mitochondrial; Flatworm Mitochondrial",
|
74
|
+
# 10=>"Euplotid Nuclear",
|
75
|
+
# 11=>"Bacterial and Plant Plastid",
|
76
|
+
# 12=>"Alternative Yeast Nuclear",
|
77
|
+
# 13=>"Ascidian Mitochondrial",
|
78
|
+
# 14=>"Alternative Flatworm Mitochondrial",
|
79
|
+
# 15=>"Blepharisma Macronuclear",
|
80
|
+
# 16=>"Chlorophycean Mitochondrial",
|
81
|
+
# 21=>"Trematode Mitochondrial",
|
82
|
+
# 22=>"Scenedesmus obliquus Mitochondrial",
|
83
|
+
# 23=>"Thraustochytrium Mitochondrial"}
|
84
|
+
|
85
|
+
# Return the names names for the ids specified in :range
|
86
|
+
definitions = parser.definitions :range => [(1..3), 5, 9]
|
87
|
+
|
88
|
+
# Return default hash map with custom names for the ids 1 and 3
|
89
|
+
definitions = parser.definitions :names => {1 => "Standard (Eukaryote)",
|
90
|
+
3 => "Yeast Mitochondorial"}
|
91
|
+
definitions[1]
|
92
|
+
# "Standard (Eukaryote)"
|
93
|
+
definitions[3]
|
94
|
+
# "Yeast Mitochondorial"
|
95
|
+
|
96
|
+
# Return the names for the ids specified in :range, with custom names for the ids 1 and 3
|
97
|
+
parser.definitions :range => [(1..3), 5, 9],
|
98
|
+
:names => {1 => "Standard (Eukaryote)",
|
99
|
+
3 => "Yeast Mitochondorial"}
|
100
|
+
|
101
|
+
```
|
102
|
+
|
103
|
+
#### CodonTableParser#starts
|
104
|
+
|
105
|
+
``` ruby
|
106
|
+
|
107
|
+
parser = CodonTableParser.new
|
108
|
+
|
109
|
+
# Return default hash map of start codons
|
110
|
+
start_codons = parser.starts
|
111
|
+
|
112
|
+
start_codons
|
113
|
+
# {1=>["ttg", "ctg", "atg"],
|
114
|
+
# 2=>["att", "atc", "ata", "atg", "gtg"],
|
115
|
+
# 3=>["ata", "atg"],
|
116
|
+
# 4=>["tta", "ttg", "ctg", "att", "atc", "ata", "atg", "gtg"],
|
117
|
+
# 5=>["ttg", "att", "atc", "ata", "atg", "gtg"],
|
118
|
+
# 6=>["atg"],
|
119
|
+
# 9=>["atg", "gtg"],
|
120
|
+
# 10=>["atg"],
|
121
|
+
# 11=>["ttg", "ctg", "att", "atc", "ata", "atg", "gtg"],
|
122
|
+
# 12=>["ctg", "atg"],
|
123
|
+
# 13=>["ttg", "ata", "atg", "gtg"],
|
124
|
+
# 14=>["atg"],
|
125
|
+
# 15=>["atg"],
|
126
|
+
# 16=>["atg"],
|
127
|
+
# 21=>["atg", "gtg"],
|
128
|
+
# 22=>["atg"],
|
129
|
+
# 23=>["att", "atg", "gtg"]}
|
130
|
+
|
131
|
+
# Return the start codons for the ids specified in :range
|
132
|
+
start_codons = parser.starts :range => [(1..3), 5, 9]
|
133
|
+
|
134
|
+
# Add or remove start codons as necessary
|
135
|
+
start_codons = parser.starts 1 => {:add => ['gtg']},
|
136
|
+
13 => {:remove => ['ttg', 'ata', 'gtg']}
|
137
|
+
|
138
|
+
start_codons[1]
|
139
|
+
# ["ttg", "ctg", "atg", "gtg"]
|
140
|
+
start_codons[13]
|
141
|
+
# ["atg"]
|
142
|
+
|
143
|
+
# Alternative syntax, normally only used in the bundle method described below
|
144
|
+
start_codons = parser.starts :starts => {1 => {:add => ['gtg']},
|
145
|
+
13 => {:remove => ['ttg', 'ata', 'gtg']}}
|
146
|
+
|
147
|
+
# Return the start codons for the ids specified with :range, add or remove codons from specific ids
|
148
|
+
start_codons = parser.starts :range => [(1..3), 13],
|
149
|
+
1 => {:add => ['gtg']},
|
150
|
+
13 => {:remove => ['ttg', 'ata', 'gtg']}
|
151
|
+
|
152
|
+
```
|
153
|
+
|
154
|
+
#### CodonTableParser#stops
|
155
|
+
|
156
|
+
``` ruby
|
157
|
+
|
158
|
+
parser = CodonTableParser.new
|
159
|
+
|
160
|
+
# Return the default hash map of stop codons
|
161
|
+
stop_codons = parser.stops
|
162
|
+
|
163
|
+
stops
|
164
|
+
# {1=>["taa", "tag", "tga"],
|
165
|
+
# 2=>["taa", "tag", "aga", "agg"],
|
166
|
+
# 3=>["taa", "tag"],
|
167
|
+
# 4=>["taa", "tag"],
|
168
|
+
# 5=>["taa", "tag"],
|
169
|
+
# 6=>["tga"],
|
170
|
+
# 9=>["taa", "tag"],
|
171
|
+
# 10=>["taa", "tag"],
|
172
|
+
# 11=>["taa", "tag", "tga"],
|
173
|
+
# 12=>["taa", "tag", "tga"],
|
174
|
+
# 13=>["taa", "tag"],
|
175
|
+
# 14=>["tag"],
|
176
|
+
# 15=>["taa", "tga"],
|
177
|
+
# 16=>["taa", "tga"],
|
178
|
+
# 21=>["taa", "tag"],
|
179
|
+
# 22=>["tca", "taa", "tga"],
|
180
|
+
# 23=>["tta", "taa", "tag", "tga"]}
|
181
|
+
|
182
|
+
|
183
|
+
# Return the stop codons for the ids specified with :range
|
184
|
+
stop_codons = parser.stops :range => [(1..3), 5, 9]
|
185
|
+
|
186
|
+
# Add or remove stop codons as necessary
|
187
|
+
|
188
|
+
stop_codons = parser.stops 1 => {:add => ['gtg'], :remove => ['taa']},
|
189
|
+
13 => {:add => ['gcc'], :remove => ['taa', 'tag']}
|
190
|
+
|
191
|
+
stop_codons[1]
|
192
|
+
# ["tag", "tga", "gtg"]
|
193
|
+
stop_codons[13]
|
194
|
+
# ["gcc"]
|
195
|
+
|
196
|
+
# Alternative syntax, normally only used in the bundle method described below
|
197
|
+
stop_codons = parser.stops :stops => {1 => {:add => ['gtg'], :remove => ['taa']},
|
198
|
+
13 => {:add => ['gcc'], :remove => ['taa', 'tag']}}
|
199
|
+
|
200
|
+
|
201
|
+
# Return the stop codons for the ids specified with :range, add or remove codons from specific ids
|
202
|
+
stop_codons = parser.stops :range => [(1..3), 5, 13],
|
203
|
+
1 => {:add => ['gtg'], :remove => ['taa']},
|
204
|
+
13 => {:add => ['gcc'], :remove => ['taa', 'tag']}
|
205
|
+
|
206
|
+
```
|
207
|
+
|
208
|
+
#### CodonTableParser#tables
|
209
|
+
|
210
|
+
``` ruby
|
211
|
+
|
212
|
+
parser = CodonTableParser.new
|
213
|
+
|
214
|
+
# Return codon tables of all species
|
215
|
+
codon_tables = parser.tables
|
216
|
+
|
217
|
+
tables
|
218
|
+
# {
|
219
|
+
# 1 => {
|
220
|
+
# 'ttt' => 'F', 'tct' => 'S', 'tat' => 'Y', 'tgt' => 'C',
|
221
|
+
# 'ttc' => 'F', 'tcc' => 'S', 'tac' => 'Y', 'tgc' => 'C',
|
222
|
+
# 'tta' => 'L', 'tca' => 'S', 'taa' => '*', 'tga' => '*',
|
223
|
+
# 'ttg' => 'L', 'tcg' => 'S', 'tag' => '*', 'tgg' => 'W',
|
224
|
+
#
|
225
|
+
# 'ctt' => 'L', 'cct' => 'P', 'cat' => 'H', 'cgt' => 'R',
|
226
|
+
# 'ctc' => 'L', 'ccc' => 'P', 'cac' => 'H', 'cgc' => 'R',
|
227
|
+
# 'cta' => 'L', 'cca' => 'P', 'caa' => 'Q', 'cga' => 'R',
|
228
|
+
# 'ctg' => 'L', 'ccg' => 'P', 'cag' => 'Q', 'cgg' => 'R',
|
229
|
+
#
|
230
|
+
# 'att' => 'I', 'act' => 'T', 'aat' => 'N', 'agt' => 'S',
|
231
|
+
# 'atc' => 'I', 'acc' => 'T', 'aac' => 'N', 'agc' => 'S',
|
232
|
+
# 'ata' => 'I', 'aca' => 'T', 'aaa' => 'K', 'aga' => 'R',
|
233
|
+
# 'atg' => 'M', 'acg' => 'T', 'aag' => 'K', 'agg' => 'R',
|
234
|
+
#
|
235
|
+
# 'gtt' => 'V', 'gct' => 'A', 'gat' => 'D', 'ggt' => 'G',
|
236
|
+
# 'gtc' => 'V', 'gcc' => 'A', 'gac' => 'D', 'ggc' => 'G',
|
237
|
+
# 'gta' => 'V', 'gca' => 'A', 'gaa' => 'E', 'gga' => 'G',
|
238
|
+
# 'gtg' => 'V', 'gcg' => 'A', 'gag' => 'E', 'ggg' => 'G',
|
239
|
+
# },
|
240
|
+
# 2 => { ... },
|
241
|
+
# 3 => { ... },
|
242
|
+
# ...
|
243
|
+
# 23 => { ... }
|
244
|
+
# }
|
245
|
+
|
246
|
+
# Return the codon tables for the ids specified with :range
|
247
|
+
codon_tables = parser.tables :range => [(1..3), 5, 9, 23]
|
248
|
+
|
249
|
+
```
|
250
|
+
|
251
|
+
#### CodonTableParser#bundle
|
252
|
+
|
253
|
+
``` ruby
|
254
|
+
|
255
|
+
parser = CodonTableParser.new
|
256
|
+
|
257
|
+
# Return the definitions, codon table, start and stop codons for all species as a hash map
|
258
|
+
bundle = parser.bundle
|
259
|
+
|
260
|
+
bundle
|
261
|
+
# {:definitions => {return value of the 'definitions' method}
|
262
|
+
# :starts => {return value of the 'starts' method}
|
263
|
+
# :stops => {return value of the 'stops' method}
|
264
|
+
# :tables => {return value of the 'tables' method}
|
265
|
+
# }
|
266
|
+
|
267
|
+
```
|
268
|
+
The *bundle* method accepts all options from the methods described above, that is:
|
269
|
+
|
270
|
+
* :range (applied to all methods)
|
271
|
+
* :names (applied to the *definitions* method)
|
272
|
+
* :starts (applied to the *starts* method)
|
273
|
+
* :stops (applied to the *stops* method)
|
274
|
+
|
275
|
+
|
276
|
+
To return the same values as are assigned to the constants *DEFINITIONS*, *STARTS*, *STOPS*, and *TABLES* of BioRuby's [CodonTable](https://github.com/bioruby/bioruby/blob/master/lib/bio/data/codontable.rb) class, calling *bundle* with the following options will do:
|
277
|
+
|
278
|
+
``` ruby
|
279
|
+
bundle = parser.bundle :names => {1 => "Standard (Eukaryote)",
|
280
|
+
4 => "Mold, Protozoan, Coelenterate Mitochondrial and Mycoplasma/Spiroplasma",
|
281
|
+
3 => "Yeast Mitochondorial",
|
282
|
+
6 => "Ciliate Macronuclear and Dasycladacean",
|
283
|
+
9 => "Echinoderm Mitochondrial",
|
284
|
+
11 => "Bacteria",
|
285
|
+
14 => "Flatworm Mitochondrial",
|
286
|
+
22 => "Scenedesmus obliquus mitochondrial"},
|
287
|
+
:starts => {1 => {:add => ['gtg']},
|
288
|
+
13 => {:remove => ['ttg', 'ata', 'gtg']}}
|
289
|
+
|
290
|
+
```
|
data/Rakefile
ADDED
@@ -0,0 +1,25 @@
|
|
1
|
+
# require 'spec/rake/spectask' # depreciated
|
2
|
+
require 'rspec/core/rake_task'
|
3
|
+
require 'rake/gempackagetask'
|
4
|
+
require 'rdoc/task'
|
5
|
+
|
6
|
+
# Build gem: rake gem
|
7
|
+
# Push gem: rake push
|
8
|
+
|
9
|
+
task :default => [ :spec, :gem ]
|
10
|
+
|
11
|
+
RSpec::Core::RakeTask.new :spec
|
12
|
+
|
13
|
+
gem_spec = eval(File.read('codon_table_parser.gemspec'))
|
14
|
+
|
15
|
+
Rake::GemPackageTask.new( gem_spec ) do |t|
|
16
|
+
t.need_zip = true
|
17
|
+
end
|
18
|
+
|
19
|
+
#RDoc::Task.new do |rdoc|
|
20
|
+
#
|
21
|
+
#end
|
22
|
+
|
23
|
+
task :push => :gem do |t|
|
24
|
+
sh "gem push pkg/#{gem_spec.name}-#{gem_spec.version}.gem"
|
25
|
+
end
|
@@ -0,0 +1,189 @@
|
|
1
|
+
# encoding: utf-8
|
2
|
+
|
3
|
+
# Parses the NCBI genetic code table, generating separate hash maps of each species' name, start & stop codons and codon table.
|
4
|
+
#
|
5
|
+
# to return definitions, start & stop codons as well as codon tables that can be used
|
6
|
+
class CodonTableParser
|
7
|
+
|
8
|
+
attr_reader :address
|
9
|
+
|
10
|
+
@default_address = 'ftp://ftp.ncbi.nih.gov/entrez/misc/data/gc.prt'
|
11
|
+
|
12
|
+
class << self
|
13
|
+
attr_accessor :default_address
|
14
|
+
end
|
15
|
+
|
16
|
+
def initialize(path = '')
|
17
|
+
@address = CodonTableParser.default_address
|
18
|
+
data = content(path)
|
19
|
+
@codons = triplets(data)
|
20
|
+
@parsed_data = parse(data)
|
21
|
+
end
|
22
|
+
|
23
|
+
def content path
|
24
|
+
if path.empty?
|
25
|
+
require 'open-uri'
|
26
|
+
f = open(@address)
|
27
|
+
else
|
28
|
+
f = File.new(path)
|
29
|
+
end
|
30
|
+
|
31
|
+
first_line = f.readline
|
32
|
+
raise Exception, "This is not the NCBI genetic code table" unless first_line =~ /--\*+/
|
33
|
+
f.read.each_line do |line|
|
34
|
+
next if line.match(/-- /)
|
35
|
+
line
|
36
|
+
end
|
37
|
+
end
|
38
|
+
|
39
|
+
def triplets data
|
40
|
+
base1, base2, base3 = bases data
|
41
|
+
arr = []
|
42
|
+
|
43
|
+
base1.each_with_index do |base, i|
|
44
|
+
arr << (base + base2[i] + base3[i]).downcase
|
45
|
+
end
|
46
|
+
arr
|
47
|
+
end
|
48
|
+
|
49
|
+
def parse data
|
50
|
+
del = /.*?\s/.source # .+ does not work as the regex is greedy.
|
51
|
+
# del = /.*?(?=[a-z])/.source # Using non-greedy search + pos. lookahead also works
|
52
|
+
l_name = /name "(.*?)"#{del}/.source
|
53
|
+
s_name = /(|name ".*?"#{del})/.source # Either nothing 'line does not exists' or the short name.
|
54
|
+
id = /id (\d+)#{del}/.source
|
55
|
+
ncbieaa = /ncbieaa "(.*?)"#{del}/.source
|
56
|
+
sncbieaa = /sncbieaa "(.*?)"/.source
|
57
|
+
|
58
|
+
# flag 'o':
|
59
|
+
# Perform inline substitutions (#{variable}) only once on creation.
|
60
|
+
# Normally, the variable is inserted on every evaluation.
|
61
|
+
result = data.scan(/#{l_name}#{s_name}#{id}#{ncbieaa}#{sncbieaa}/mo).
|
62
|
+
inject([]) do |res, (l_name, s_name, id, ncbieaa, sncbieaa)|
|
63
|
+
|
64
|
+
short = s_name.match(/[A-Z]{3}\d/)[0] unless s_name.empty?
|
65
|
+
l_name = l_name.gsub(/\n/,'')
|
66
|
+
|
67
|
+
res << {:id => id.to_i,
|
68
|
+
:long_name => l_name,
|
69
|
+
:short_name => short,
|
70
|
+
:ncbieaa => ncbieaa,
|
71
|
+
:sncbieaa => sncbieaa}
|
72
|
+
end
|
73
|
+
|
74
|
+
result
|
75
|
+
end
|
76
|
+
|
77
|
+
|
78
|
+
def definitions options = {}
|
79
|
+
Hash[@parsed_data.map do |species|
|
80
|
+
id = species[:id]
|
81
|
+
name = species[:long_name]
|
82
|
+
new_names = options[:names]
|
83
|
+
if new_names
|
84
|
+
name = new_names[id] if new_names[id]
|
85
|
+
end
|
86
|
+
custom_range(options[:range], id) {[id, name]}
|
87
|
+
end]
|
88
|
+
end
|
89
|
+
|
90
|
+
def starts options = {}
|
91
|
+
Hash[@parsed_data.map do |species|
|
92
|
+
codons = []
|
93
|
+
species[:sncbieaa].split(//).each_with_index do |pos, i|
|
94
|
+
if pos == 'M'
|
95
|
+
codons << @codons[i]
|
96
|
+
end
|
97
|
+
end
|
98
|
+
id = species[:id]
|
99
|
+
# Options can either be passed as :starts => {1 => {:add => ...}} or 1 => {:add => ...}
|
100
|
+
selection = options[:starts] || options
|
101
|
+
codons = custom_codons(selection[id], codons)
|
102
|
+
custom_range(options[:range], id) {[id, codons]}
|
103
|
+
end]
|
104
|
+
end
|
105
|
+
|
106
|
+
|
107
|
+
def stops options = {}
|
108
|
+
Hash[@parsed_data.map do |species|
|
109
|
+
codons = []
|
110
|
+
species[:ncbieaa].split(//).each_with_index do |pos, i|
|
111
|
+
if pos == '*'
|
112
|
+
codons << @codons[i]
|
113
|
+
end
|
114
|
+
end
|
115
|
+
|
116
|
+
id = species[:id]
|
117
|
+
# Options can either be passed as :stops => {1 => {:add => ...}} or 1 => {:add => ...}
|
118
|
+
selection = options[:stops] || options
|
119
|
+
codons = custom_codons(selection[id], codons)
|
120
|
+
custom_range(options[:range], id) {[id, codons]}
|
121
|
+
end]
|
122
|
+
end
|
123
|
+
|
124
|
+
def tables options = {}
|
125
|
+
Hash[@parsed_data.map do |species|
|
126
|
+
id = species[:id]
|
127
|
+
codon_table = table(@codons, species[:ncbieaa])
|
128
|
+
custom_range(options[:range], id) {[id, codon_table]}
|
129
|
+
end]
|
130
|
+
end
|
131
|
+
|
132
|
+
def bundle options = {}
|
133
|
+
{:definitions => definitions(options),
|
134
|
+
:starts => starts(options),
|
135
|
+
:stops => stops(options),
|
136
|
+
:tables => tables(options)}
|
137
|
+
end
|
138
|
+
|
139
|
+
|
140
|
+
def bases data
|
141
|
+
del = /[^\n]*\n\s+/.source
|
142
|
+
base1 = /-- Base1\s+([A-Z]+)#{del}/.source
|
143
|
+
base2 = /-- Base2\s+([A-Z]+)#{del}/.source
|
144
|
+
base3 = /-- Base3\s+([A-Z]+)/.source
|
145
|
+
data.scan(/#{base1}#{base2}#{base3}/m).first.map do |base|
|
146
|
+
base.split(//)
|
147
|
+
end
|
148
|
+
end
|
149
|
+
|
150
|
+
def prepare_range range
|
151
|
+
require 'set'
|
152
|
+
range.map do |val|
|
153
|
+
val.is_a?(Range) ? val.to_a : val
|
154
|
+
end.flatten.to_set.sort
|
155
|
+
end
|
156
|
+
|
157
|
+
|
158
|
+
def custom_range options, id, &block
|
159
|
+
range = prepare_range options if options
|
160
|
+
if range
|
161
|
+
block.call if range.include?(id)
|
162
|
+
else
|
163
|
+
block.call
|
164
|
+
end
|
165
|
+
end
|
166
|
+
|
167
|
+
|
168
|
+
def custom_codons options, codons
|
169
|
+
opt = options
|
170
|
+
if opt
|
171
|
+
codons = codons | opt[:add] if opt[:add]
|
172
|
+
codons = codons.delete_if {|codon| opt[:remove].include?(codon)} if opt[:remove]
|
173
|
+
end
|
174
|
+
codons
|
175
|
+
end
|
176
|
+
|
177
|
+
def table triplets, ncbieaa
|
178
|
+
ncbieaa = ncbieaa.split(//)
|
179
|
+
|
180
|
+
hash = {}
|
181
|
+
triplets.each_with_index do |codon, i|
|
182
|
+
hash[codon] = ncbieaa[i]
|
183
|
+
end
|
184
|
+
hash
|
185
|
+
end
|
186
|
+
|
187
|
+
private :content, :bases, :prepare_range, :custom_range, :custom_codons, :table
|
188
|
+
end
|
189
|
+
|
metadata
ADDED
@@ -0,0 +1,66 @@
|
|
1
|
+
--- !ruby/object:Gem::Specification
|
2
|
+
name: codon_table_parser
|
3
|
+
version: !ruby/object:Gem::Version
|
4
|
+
version: 0.2.0
|
5
|
+
prerelease:
|
6
|
+
platform: ruby
|
7
|
+
authors:
|
8
|
+
- Stefan Rohlfing
|
9
|
+
autorequire:
|
10
|
+
bindir: bin
|
11
|
+
cert_chain: []
|
12
|
+
date: 2011-11-03 00:00:00.000000000Z
|
13
|
+
dependencies:
|
14
|
+
- !ruby/object:Gem::Dependency
|
15
|
+
name: rspec
|
16
|
+
requirement: &13232540 !ruby/object:Gem::Requirement
|
17
|
+
none: false
|
18
|
+
requirements:
|
19
|
+
- - ! '>='
|
20
|
+
- !ruby/object:Gem::Version
|
21
|
+
version: '0'
|
22
|
+
type: :development
|
23
|
+
prerelease: false
|
24
|
+
version_requirements: *13232540
|
25
|
+
description: ! ' Parses the NCBI genetic code table, generating hash maps of each
|
26
|
+
species'' name, start codons, stop codons and codon table. The output of CodonTableParser
|
27
|
+
can be customized easily and used to update the respective constants of BioRuby''s
|
28
|
+
CodonTable class whenever the original data has changed.
|
29
|
+
|
30
|
+
'
|
31
|
+
email: stefan.rohlfing@gmail.com
|
32
|
+
executables: []
|
33
|
+
extensions: []
|
34
|
+
extra_rdoc_files: []
|
35
|
+
files:
|
36
|
+
- lib/codon_table_parser.rb
|
37
|
+
- lib/codon_table_parser/version.rb
|
38
|
+
- README.md
|
39
|
+
- Rakefile
|
40
|
+
- LICENSE
|
41
|
+
homepage: http://github.com/bytesource/codon_table_parser
|
42
|
+
licenses: []
|
43
|
+
post_install_message:
|
44
|
+
rdoc_options: []
|
45
|
+
require_paths:
|
46
|
+
- lib
|
47
|
+
required_ruby_version: !ruby/object:Gem::Requirement
|
48
|
+
none: false
|
49
|
+
requirements:
|
50
|
+
- - ! '>='
|
51
|
+
- !ruby/object:Gem::Version
|
52
|
+
version: 1.9.1
|
53
|
+
required_rubygems_version: !ruby/object:Gem::Requirement
|
54
|
+
none: false
|
55
|
+
requirements:
|
56
|
+
- - ! '>='
|
57
|
+
- !ruby/object:Gem::Version
|
58
|
+
version: '0'
|
59
|
+
requirements: []
|
60
|
+
rubyforge_project: codon_table_parser
|
61
|
+
rubygems_version: 1.8.10
|
62
|
+
signing_key:
|
63
|
+
specification_version: 3
|
64
|
+
summary: Parses the NCBI genetic code table, generating hash maps of each species'
|
65
|
+
name, start codons, stop codons and codon table.
|
66
|
+
test_files: []
|