bio-protparam 0.1.0

Sign up to get free protection for your applications and to get access to all the features.
@@ -0,0 +1,5 @@
1
+ lib/**/*.rb
2
+ bin/*
3
+ -
4
+ features/**/*.feature
5
+ LICENSE.txt
@@ -0,0 +1,12 @@
1
+ language: ruby
2
+ rvm:
3
+ - 1.9.2
4
+ - 1.9.3
5
+ - jruby-19mode # JRuby in 1.9 mode
6
+ - rbx-19mode
7
+ # - 1.8.7
8
+ # - jruby-18mode # JRuby in 1.8 mode
9
+ # - rbx-18mode
10
+
11
+ # uncomment this line if your project needs to run something other than `rake`:
12
+ # script: bundle exec rspec spec
data/Gemfile ADDED
@@ -0,0 +1,9 @@
1
+ source "http://rubygems.org"
2
+
3
+ gem "bio", ">= 1.4.2"
4
+
5
+ group :development, :test do
6
+ gem "minitest", ">= 0"
7
+ gem "rdoc", "~> 3.12"
8
+ gem "jeweler", "~> 1.8.4"
9
+ end
@@ -0,0 +1,20 @@
1
+ Copyright (c) 2012 hryk <hiroyuki@1vq9.com>
2
+
3
+ Permission is hereby granted, free of charge, to any person obtaining
4
+ a copy of this software and associated documentation files (the
5
+ "Software"), to deal in the Software without restriction, including
6
+ without limitation the rights to use, copy, modify, merge, publish,
7
+ distribute, sublicense, and/or sell copies of the Software, and to
8
+ permit persons to whom the Software is furnished to do so, subject to
9
+ the following conditions:
10
+
11
+ The above copyright notice and this permission notice shall be
12
+ included in all copies or substantial portions of the Software.
13
+
14
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
15
+ EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
16
+ MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
17
+ NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE
18
+ LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION
19
+ OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION
20
+ WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
@@ -0,0 +1,47 @@
1
+ # bio-protparam
2
+
3
+ [![Build Status](https://secure.travis-ci.org/hryk/bioruby-protparam.png)](http://travis-ci.org/hryk/bioruby-protparam)
4
+
5
+ `bio-protparam` adds Bio::Protparam class. Bio::Protparam has same interface and
6
+ function as Bio::Tools::Protparam class of BioPerl, except that it calculate
7
+ parameters instead of throwing query to Expasy protparam tool.
8
+
9
+ **Note: this software is under active development!**
10
+
11
+ ## Installation
12
+
13
+ ```sh
14
+ gem install bio-protparam
15
+ ```
16
+
17
+ ## Usage
18
+
19
+ ```ruby
20
+ require 'bio-protparam'
21
+
22
+ protparam = Bio::Protparam.new("MYNNYNLCHIRTINWEEIITGPSAMYSYVY...")
23
+ # Return Mw
24
+ protparam.molecular_weight
25
+ # Return pI
26
+ protparam.theorettical_pI
27
+
28
+ ```
29
+
30
+ The API doc is on [rdoc.info](http://rdoc.info/github/hryk/bioruby-protparam/). For
31
+ more code examples see the test files in the source tree.
32
+
33
+ ## Cite
34
+
35
+ If you use this software, please cite one of
36
+
37
+ * [BioRuby: bioinformatics software for the Ruby programming language](http://dx.doi.org/10.1093/bioinformatics/btq475)
38
+ * [Biogem: an effective tool-based approach for scaling up open source software development in bioinformatics](http://dx.doi.org/10.1093/bioinformatics/bts080)
39
+
40
+ ## Biogems.info
41
+
42
+ This Biogem is published at [#bio-protparam](http://biogems.info/index.html)
43
+
44
+ ## Copyright
45
+
46
+ Copyright (c) 2012 hryk. See LICENSE.txt for further details.
47
+
@@ -0,0 +1,48 @@
1
+ = bio-protparam
2
+
3
+ {<img
4
+ src="https://secure.travis-ci.org/hryk/bioruby-protparam.png"
5
+ />}[http://travis-ci.org/#!/hryk/bioruby-protparam]
6
+
7
+ Full description goes here
8
+
9
+ Note: this software is under active development!
10
+
11
+ == Installation
12
+
13
+ gem install bio-protparam
14
+
15
+ == Usage
16
+
17
+ == Developers
18
+
19
+ To use the library
20
+
21
+ require 'bio-protparam'
22
+
23
+ The API doc is online. For more code examples see also the test files in
24
+ the source tree.
25
+
26
+ == Project home page
27
+
28
+ Information on the source tree, documentation, issues and how to contribute, see
29
+
30
+ http://github.com/hryk/bioruby-protparam
31
+
32
+ The BioRuby community is on IRC server: irc.freenode.org, channel: #bioruby.
33
+
34
+ == Cite
35
+
36
+ If you use this software, please cite one of
37
+
38
+ * [BioRuby: bioinformatics software for the Ruby programming language](http://dx.doi.org/10.1093/bioinformatics/btq475)
39
+ * [Biogem: an effective tool-based approach for scaling up open source software development in bioinformatics](http://dx.doi.org/10.1093/bioinformatics/bts080)
40
+
41
+ == Biogems.info
42
+
43
+ This Biogem is published at http://biogems.info/index.html#bio-protparam
44
+
45
+ == Copyright
46
+
47
+ Copyright (c) 2012 hryk. See LICENSE.txt for further details.
48
+
@@ -0,0 +1,42 @@
1
+ # encoding: utf-8
2
+
3
+ require 'rubygems'
4
+ require 'bundler'
5
+ begin
6
+ Bundler.setup(:default, :development)
7
+ rescue Bundler::BundlerError => e
8
+ $stderr.puts e.message
9
+ $stderr.puts "Run `bundle install` to install missing gems"
10
+ exit e.status_code
11
+ end
12
+ require 'rake'
13
+
14
+ require 'jeweler'
15
+ Jeweler::Tasks.new do |gem|
16
+ gem.name = "bio-protparam"
17
+ gem.homepage = "http://github.com/hryk/bioruby-protparam"
18
+ gem.license = "MIT"
19
+ gem.summary = %Q{A Protparam compatible utility for BioRuby.}
20
+ gem.description = %Q{Bio::Protparam has same interface and function as Bio::Tools::Protparam class of BioPerl, except that it calculate parameters instead of throwing query to Expasy protparam tool.}
21
+ gem.email = "hiroyuki@1vq9.com"
22
+ gem.authors = ["hryk"]
23
+ end
24
+ Jeweler::RubygemsDotOrgTasks.new
25
+
26
+ require 'rake/testtask'
27
+ Rake::TestTask.new(:test) do |test|
28
+ test.libs << 'lib' << 'test'
29
+ test.pattern = 'test/**/test_*.rb'
30
+ test.verbose = true
31
+ end
32
+
33
+ task :default => :test
34
+
35
+ require 'rdoc/task'
36
+ Rake::RDocTask.new do |rdoc|
37
+ version = File.exist?('VERSION') ? File.read('VERSION') : ""
38
+ rdoc.rdoc_dir = 'rdoc'
39
+ rdoc.title = "bio-protparam #{version}"
40
+ rdoc.rdoc_files.include('README*')
41
+ rdoc.rdoc_files.include('lib/**/*.rb')
42
+ end
data/VERSION ADDED
@@ -0,0 +1 @@
1
+ 0.1.0
@@ -0,0 +1,12 @@
1
+ # Please require your code below, respecting the naming conventions in the
2
+ # bioruby directory tree.
3
+ #
4
+ # For example, say you have a plugin named bio-plugin, the only uncommented
5
+ # line in this file would be
6
+ #
7
+ # require 'bio/bio-plugin/plugin'
8
+ #
9
+ # In this file only require other files. Avoid other source code.
10
+
11
+ require 'bio/util/protparam'
12
+
@@ -0,0 +1,817 @@
1
+ # encoding: utf-8
2
+ #
3
+ #
4
+ # = bio/appl/protparam.rb - A Class to Calculate Protein Parameters.
5
+ #
6
+ # Copyright:: Copyright (C) 2012
7
+ # Hiroyuki Nakamura <hiroyuki@1vq9.com>
8
+ # License:: The Ruby License
9
+ #
10
+ require 'rational'
11
+
12
+ module Bio
13
+ ##
14
+ # == Description
15
+ #
16
+ # Bio::Protparam is a class for calculating protein paramesters. This class
17
+ # has a similer interface to BioPerl's Bio::Tools::Protparam. However, it
18
+ # calculate parameters instead of throwing a query to Expasy's {Protparam
19
+ # tool}[http://web.expasy.org/protparam/]{[1]}[rdoc-label:1] as Bio::Tools::Protparam does.
20
+ #
21
+ class Protparam
22
+
23
+ # {IUPAC codes}[http://www.bioinformatics.org/sms2/iupac.html] for amino acids.
24
+ IUPAC_CODE = {
25
+ :I => "Ile",
26
+ :V => "Val",
27
+ :L => "Leu",
28
+ :F => "Phe",
29
+ :C => "Cys",
30
+ :M => "Met",
31
+ :A => "Ala",
32
+ :G => "Gly",
33
+ :T => "Thr",
34
+ :W => "Trp",
35
+ :S => "Ser",
36
+ :Y => "Tyr",
37
+ :P => "Pro",
38
+ :H => "His",
39
+ :E => "Glu",
40
+ :Q => "Gln",
41
+ :D => "Asp",
42
+ :N => "Asn",
43
+ :K => "Lys",
44
+ :R => "Arg",
45
+ :U => "Sec",
46
+ :O => "Pyl",
47
+ :B => "Asx",
48
+ :Z => "Glx",
49
+ :X => "Xaa"
50
+ }
51
+
52
+ # Dipeptide instability weight value for calculating instability index of proteins {[10]}[rdoc-label:10].
53
+ DIWV = {
54
+ :W => {
55
+ :W => 1.0, :C => 1.0, :M => 24.68, :H => 24.68, :Y => 1.0, :F => 1.0, :Q => 1.0,
56
+ :N => 13.34, :I => 1.0, :R => 1.0, :D => 1.0, :P => 1.0, :T => -14.03, :K => 1.0,
57
+ :E => 1.0, :V => -7.49, :S => 1.0, :G => -9.37, :A => -14.03, :L => 13.34
58
+ },
59
+ :C => {
60
+ :W => 24.68, :C => 1.0, :M => 33.6, :H => 33.6, :Y => 1.0, :F => 1.0, :Q => -6.54, :N => 1.0,
61
+ :I => 1.0, :R => 1.0, :D => 20.26, :P => 20.26, :T => 33.6, :K => 1.0, :E => 1.0, :V => -6.54,
62
+ :S => 1.0, :G => 1.0, :A => 1.0, :L => 20.26
63
+ },
64
+ :M => {
65
+ :W => 1.0, :C => 1.0, :M => -1.88, :H => 58.28, :Y => 24.68, :F => 1.0, :Q => -6.54,
66
+ :N => 1.0, :I => 1.0, :R => -6.54, :D => 1.0, :P => 44.94, :T => -1.88, :K => 1.0, :E => 1.0,
67
+ :V => 1.0, :S => 44.94, :G => 1.0, :A => 13.34, :L => 1.0
68
+ },
69
+ :H => {
70
+ :W => -1.88, :C => 1.0, :M => 1.0, :H => 1.0, :Y => 44.94, :F => -9.37, :Q => 1.0,
71
+ :N => 24.68, :I => 44.94, :R => 1.0, :D => 1.0, :P => -1.88, :T => -6.54, :K => 24.68,
72
+ :E => 1.0, :V => 1.0, :S => 1.0, :G => -9.37, :A => 1.0, :L => 1.0
73
+ },
74
+ :Y => {
75
+ :W => -9.37, :C => 1.0, :M => 44.94, :H => 13.34, :Y => 13.34, :F => 1.0, :Q => 1.0,
76
+ :N => 1.0, :I => 1.0, :R => -15.91, :D => 24.68, :P => 13.34, :T => -7.49, :K => 1.0,
77
+ :E => -6.54, :V => 1.0, :S => 1.0, :G => -7.49, :A => 24.68, :L => 1.0
78
+ },
79
+ :F => {
80
+ :W => 1.0, :C => 1.0, :M => 1.0, :H => 1.0, :Y => 33.6, :F => 1.0, :Q => 1.0, :N => 1.0,
81
+ :I => 1.0, :R => 1.0, :D => 13.34, :P => 20.26, :T => 1.0, :K => -14.03, :E => 1.0,
82
+ :V => 1.0, :S => 1.0, :G => 1.0, :A => 1.0, :L => 1.0
83
+ },
84
+ :Q => {
85
+ :W => 1.0, :C => -6.54, :M => 1.0, :H => 1.0, :Y => -6.54, :F => -6.54, :Q => 20.26,
86
+ :N => 1.0, :I => 1.0, :R => 1.0, :D => 20.26, :P => 20.26, :T => 1.0, :K => 1.0, :E => 20.26,
87
+ :V => -6.54, :S => 44.94, :G => 1.0, :A => 1.0, :L => 1.0
88
+ },
89
+ :N => {
90
+ :W => -9.37, :C => -1.88, :M => 1.0, :H => 1.0, :Y => 1.0, :F => -14.03, :Q => -6.54,
91
+ :N => 1.0, :I => 44.94, :R => 1.0, :D => 1.0, :P => -1.88, :T => -7.49, :K => 24.68,
92
+ :E => 1.0, :V => 1.0, :S => 1.0, :G => -14.03, :A => 1.0, :L => 1.0
93
+ },
94
+ :I => {
95
+ :W => 1.0, :C => 1.0, :M => 1.0, :H => 13.34, :Y => 1.0, :F => 1.0, :Q => 1.0, :N => 1.0,
96
+ :I => 1.0, :R => 1.0, :D => 1.0, :P => -1.88, :T => 1.0, :K => -7.49, :E => 44.94,
97
+ :V => -7.49, :S => 1.0, :G => 1.0, :A => 1.0, :L => 20.26
98
+ },
99
+ :R => {
100
+ :W => 58.28, :C => 1.0, :M => 1.0, :H => 20.26, :Y => -6.54, :F => 1.0, :Q => 20.26,
101
+ :N => 13.34, :I => 1.0, :R => 58.28, :D => 1.0, :P => 20.26, :T => 1.0, :K => 1.0, :E => 1.0,
102
+ :V => 1.0, :S => 44.94, :G => -7.49, :A => 1.0, :L => 1.0
103
+ },
104
+ :D => {
105
+ :W => 1.0, :C => 1.0, :M => 1.0, :H => 1.0, :Y => 1.0, :F => -6.54, :Q => 1.0, :N => 1.0,
106
+ :I => 1.0, :R => -6.54, :D => 1.0, :P => 1.0, :T => -14.03, :K => -7.49, :E => 1.0,
107
+ :V => 1.0, :S => 20.26, :G => 1.0, :A => 1.0, :L => 1.0
108
+ },
109
+ :P => {
110
+ :W => -1.88, :C => -6.54, :M => -6.54, :H => 1.0, :Y => 1.0, :F => 20.26, :Q => 20.26,
111
+ :N => 1.0, :I => 1.0, :R => -6.54, :D => -6.54, :P => 20.26, :T => 1.0, :K => 1.0, :E => 18.38,
112
+ :V => 20.26, :S => 20.26, :G => 1.0, :A => 20.26, :L => 1.0
113
+ },
114
+ :T => {
115
+ :W => -14.03, :C => 1.0, :M => 1.0, :H => 1.0, :Y => 1.0, :F => 13.34, :Q => -6.54,
116
+ :N => -14.03, :I => 1.0, :R => 1.0, :D => 1.0, :P => 1.0, :T => 1.0, :K => 1.0, :E => 20.26,
117
+ :V => 1.0, :S => 1.0, :G => -7.49, :A => 1.0, :L => 1.0
118
+ },
119
+ :K => {
120
+ :W => 1.0, :C => 1.0, :M => 33.6, :H => 1.0, :Y => 1.0, :F => 1.0, :Q => 24.68, :N => 1.0,
121
+ :I => -7.49, :R => 33.6, :D => 1.0, :P => -6.54, :T => 1.0, :K => 1.0, :E => 1.0, :V => -7.49,
122
+ :S => 1.0, :G => -7.49, :A => 1.0, :L => -7.49
123
+ },
124
+ :E => {
125
+ :W => -14.03, :C => 44.94, :M => 1.0, :H => -6.54, :Y => 1.0, :F => 1.0, :Q => 20.26,
126
+ :N => 1.0, :I => 20.26, :R => 1.0, :D => 20.26, :P => 20.26, :T => 1.0, :K => 1.0, :E => 33.6,
127
+ :V => 1.0, :S => 20.26, :G => 1.0, :A => 1.0, :L => 1.0
128
+ },
129
+ :V => {
130
+ :W => 1.0, :C => 1.0, :M => 1.0, :H => 1.0, :Y => -6.54, :F => 1.0, :Q => 1.0, :N => 1.0,
131
+ :I => 1.0, :R => 1.0, :D => -14.03, :P => 20.26, :T => -7.49, :K => -1.88, :E => 1.0,
132
+ :V => 1.0, :S => 1.0, :G => -7.49, :A => 1.0, :L => 1.0
133
+ },
134
+ :S => {
135
+ :W => 1.0, :C => 33.6, :M => 1.0, :H => 1.0, :Y => 1.0, :F => 1.0, :Q => 20.26, :N => 1.0,
136
+ :I => 1.0, :R => 20.26, :D => 1.0, :P => 44.94, :T => 1.0, :K => 1.0, :E => 20.26, :V => 1.0,
137
+ :S => 20.26, :G => 1.0, :A => 1.0, :L => 1.0
138
+ },
139
+ :G => {
140
+ :W => 13.34, :C => 1.0, :M => 1.0, :H => 1.0, :Y => -7.49, :F => 1.0, :Q => 1.0, :N => -7.49,
141
+ :I => -7.49, :R => 1.0, :D => 1.0, :P => 1.0, :T => -7.49, :K => -7.49, :E => -6.54,
142
+ :V => 1.0, :S => 1.0, :G => 13.34, :A => -7.49, :L => 1.0
143
+ },
144
+ :A => {
145
+ :W => 1.0, :C => 44.94, :M => 1.0, :H => -7.49, :Y => 1.0, :F => 1.0, :Q => 1.0, :N => 1.0,
146
+ :I => 1.0, :R => 1.0, :D => -7.49, :P => 20.26, :T => 1.0, :K => 1.0, :E => 1.0, :V => 1.0,
147
+ :S => 1.0, :G => 1.0, :A => 1.0, :L => 1.0
148
+ },
149
+ :L => {
150
+ :W => 24.68, :C => 1.0, :M => 1.0, :H => 1.0, :Y => 1.0, :F => 1.0, :Q => 33.6, :N => 1.0,
151
+ :I => 1.0, :R => 20.26, :D => 1.0, :P => 20.26, :T => 1.0, :K => -7.49, :E => 1.0, :V => 1.0,
152
+ :S => 1.0, :G => 1.0, :A => 1.0, :L => 1.0
153
+ }
154
+ }
155
+
156
+ # Estemated half-life of N-terminal residue of a protein.
157
+ HALFLIFE = {
158
+ :ecoli => {
159
+ :I => 600,
160
+ :V => 600,
161
+ :L => 2,
162
+ :F => 2,
163
+ :C => 600,
164
+ :M => 600,
165
+ :A => 600,
166
+ :G => 600,
167
+ :T => 600,
168
+ :W => 2,
169
+ :S => 600,
170
+ :Y => 2,
171
+ :P => 600,
172
+ :H => 600,
173
+ :E => 600,
174
+ :Q => 600,
175
+ :D => 600,
176
+ :N => 600,
177
+ :K => 2,
178
+ :R => 2,
179
+ :U => 600
180
+ },
181
+ :mammalian => {
182
+ :A => 264,
183
+ :R => 60,
184
+ :N => 84,
185
+ :D => 66,
186
+ :C => 72,
187
+ :Q => 48,
188
+ :E => 60,
189
+ :G => 30,
190
+ :H => 210,
191
+ :I => 1200,
192
+ :L => 330,
193
+ :K => 78,
194
+ :M => 1800,
195
+ :F => 66,
196
+ :P => 1200,
197
+ :S => 114,
198
+ :T => 432,
199
+ :W => 168,
200
+ :Y => 168,
201
+ :V => 6000
202
+ },
203
+ :yeast => {
204
+ :A => 1200,
205
+ :R => 2,
206
+ :N => 3,
207
+ :D => 3,
208
+ :C => 1200,
209
+ :Q => 10,
210
+ :E => 30,
211
+ :G => 1200,
212
+ :H => 10,
213
+ :I => 30,
214
+ :L => 3,
215
+ :K => 3,
216
+ :M => 1200,
217
+ :F => 3,
218
+ :P => 1200,
219
+ :S => 1200,
220
+ :T => 1200,
221
+ :W => 3,
222
+ :Y => 10,
223
+ :V => 1200
224
+ }
225
+ }
226
+
227
+ ## TOP-IDP
228
+ ##
229
+ ## http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2676888/
230
+ ##
231
+ # TOP_IDP = {
232
+ # :I => -0.486,
233
+ # :V => -0.121,
234
+ # :L => -0.326,
235
+ # :F => -0.697,
236
+ # :C => 0.02,
237
+ # :M => -0.397,
238
+ # :A => 0.06,
239
+ # :G => 0.166,
240
+ # :T => 0.059,
241
+ # :W => -0.884,
242
+ # :S => 0.341,
243
+ # :Y => -0.510,
244
+ # :P => 0.987,
245
+ # :H => 0.303,
246
+ # :E => 0.736,
247
+ # :Q => 0.318,
248
+ # :D => 0.192,
249
+ # :N => 0.007,
250
+ # :K => 0.586,
251
+ # :R => 0.180,
252
+ # :U => 0.02
253
+ # }
254
+
255
+ # Hydropathy values for amino acids {[12]}[rdoc-label:12].
256
+ HYDROPATHY = {
257
+ :I => 4.5 ,
258
+ :V => 4.2 ,
259
+ :L => 3.8 ,
260
+ :F => 2.8 ,
261
+ :C => 2.5 ,
262
+ :M => 1.9 ,
263
+ :A => 1.8 ,
264
+ :G => -0.4,
265
+ :T => -0.7,
266
+ :W => -0.9,
267
+ :S => -0.8,
268
+ :Y => -1.3,
269
+ :P => -1.6,
270
+ :H => -3.2,
271
+ :E => -3.5,
272
+ :Q => -3.5,
273
+ :D => -3.5,
274
+ :N => -3.5,
275
+ :K => -3.9,
276
+ :R => -4.5,
277
+ :U => 2.5
278
+ }
279
+
280
+ # {Average isotopic masses of amino acids}[http://web.expasy.org/findmod/findmod_masses.html#AA]
281
+ AVERAGE_MASS = {
282
+ :I => 113.1594,
283
+ :V => 99.1326,
284
+ :L => 113.1594,
285
+ :F => 147.1766,
286
+ :C => 103.1388,
287
+ :M => 131.1926,
288
+ :A => 71.0788,
289
+ :G => 57.0519,
290
+ :T => 101.1051,
291
+ :W => 186.2132,
292
+ :S => 87.0782,
293
+ :Y => 163.1760,
294
+ :P => 97.1167,
295
+ :H => 137.1411,
296
+ :E => 129.1155,
297
+ :Q => 128.1307,
298
+ :D => 115.0886,
299
+ :N => 114.1038,
300
+ :K => 128.1741,
301
+ :R => 156.1875,
302
+ :U => 150.0388
303
+ }
304
+ WATER_MASS = 18.01524
305
+
306
+ # Atomic composition of amino acids.
307
+ ATOM = {
308
+ :I => {:C => 6, :H => 13, :O => 2, :N => 1, :S => 0}, # C6H13NO2
309
+ :V => {:C => 5, :H => 11, :O => 2, :N => 1, :S => 0}, # C5H11NO2
310
+ :L => {:C => 6, :H => 13, :O => 2, :N => 1, :S => 0}, # C6H13NO2
311
+ :F => {:C => 9, :H => 11, :O => 2, :N => 1, :S => 0}, # C9H11NO2
312
+ :C => {:C => 3, :H => 7 , :O => 2, :N => 1, :S => 1}, # C3H7NO2S
313
+ :M => {:C => 5, :H => 11 ,:O => 2, :N => 1, :S => 1}, # C5H11NO2S
314
+ :A => {:C => 3, :H => 7 , :O => 2, :N => 1, :S => 0}, # C3H7NO2
315
+ :G => {:C => 2, :H => 5 , :O => 2, :N => 1, :S => 0}, # C2H5NO2
316
+ :T => {:C => 4, :H => 9 , :O => 3, :N => 1, :S => 0}, # C4H9NO3
317
+ :W => {:C => 11,:H => 12, :O => 2, :N => 2, :S => 0}, # C11H12N2O2
318
+ :S => {:C => 3, :H => 7 , :O => 3, :N => 1, :S => 0}, # C3H7NO3
319
+ :Y => {:C => 9, :H => 11, :O => 3, :N => 1, :S => 0}, # C9H11NO3
320
+ :P => {:C => 5, :H => 9 , :O => 2, :N => 1, :S => 0}, # C5H9NO2
321
+ :H => {:C => 6, :H => 9 , :O => 2, :N => 3, :S => 0}, # C6H9N3O2
322
+ :E => {:C => 5, :H => 9 , :O => 4, :N => 1, :S => 0}, # C5H9NO4
323
+ :Q => {:C => 5, :H => 10, :O => 3, :N => 2, :S => 0}, # C5H10N2O3
324
+ :D => {:C => 4, :H => 7 , :O => 4, :N => 1, :S => 0}, # C4H7NO4
325
+ :N => {:C => 4, :H => 8 , :O => 3, :N => 2, :S => 0}, # C4H8N2O3
326
+ :K => {:C => 6, :H => 14, :O => 2, :N => 2, :S => 0}, # C6H14N2O2
327
+ :R => {:C => 6, :H => 14, :O => 2, :N => 4, :S => 0}, # C6H14N4O2
328
+ }
329
+
330
+ ##
331
+ #
332
+ # pK value from Bjellqvist, et al {[13]}[rdoc-label:13].
333
+ # Taking into account the decrease in pK differences
334
+ # between acids and bases when going from water
335
+ # to 8 M urea, a value of 7.5 has been assigned to the
336
+ # N-terminal residue .
337
+ #
338
+ PK = {
339
+ :cterm => {
340
+ :normal => 3.55, :D => 4.55, :E => 4.75
341
+ },
342
+ :nterm => {
343
+ :A => 7.59, :M => 7.00, :S => 6.93, :P => 8.36,
344
+ :T => 6.82, :V => 7.44, :E => 7.70 , :G => 7.50
345
+ },
346
+ :internal => {
347
+ :D => 4.05, :E => 4.45, :H => 5.98, :C => 9.0,
348
+ :Y => 10.0, :K => 10.0, :R => 12.0
349
+ }
350
+ }
351
+
352
+ def initialize(seq)
353
+ if seq.kind_of?(String) && Bio::Sequence.guess(seq) == Bio::Sequence::AA
354
+ # TODO: has issue.
355
+ @seq = Bio::Sequence::AA.new seq
356
+ elsif seq.kind_of? Bio::Sequence::AA
357
+ @seq = seq
358
+ elsif seq.kind_of?(Bio::Sequence) &&
359
+ seq.guess.kind_of?(Bio::Sequence::AA)
360
+ @seq = seq.guess
361
+ else
362
+ raise ArgumentError, "sequence must be an AA sequence"
363
+ end
364
+ end
365
+
366
+ ##
367
+ #
368
+ # Return the number of negative amino acids (D and E) in an AA sequence.
369
+ #
370
+ def num_neg
371
+ @num_neg ||= @seq.count("DE")
372
+ end
373
+
374
+ ##
375
+ #
376
+ # Return the number of positive amino acids (R and K) in an AA sequence.
377
+ #
378
+ def num_pos
379
+ @num_neg ||= @seq.count("RK")
380
+ end
381
+
382
+ ##
383
+ #
384
+ # Return the number of residues in an AA sequence.
385
+ #
386
+ def amino_acid_number
387
+ @seq.length
388
+ end
389
+
390
+ ##
391
+ #
392
+ # Return the number of atoms in a sequence. If type is given, return the
393
+ # number of specific atoms in a sequence.
394
+ #
395
+ def total_atoms(type=nil)
396
+ if !type.nil?
397
+ type = type.to_sym
398
+ if /^(?:C|H|O|N|S){1}$/ !~ type.to_s
399
+ raise ArgumentError, "type must be C/H/O/N/S/nil(all)"
400
+ end
401
+ end
402
+ num_atom = {:C => 0,
403
+ :H => 0,
404
+ :O => 0,
405
+ :N => 0,
406
+ :S => 0}
407
+ each_aa do |aa|
408
+ ATOM[aa].each do |t, num|
409
+ num_atom[t] += num
410
+ end
411
+ end
412
+ num_atom[:H] = num_atom[:H] - 2 * (amino_acid_number - 1)
413
+ num_atom[:O] = num_atom[:O] - (amino_acid_number - 1)
414
+ if type.nil?
415
+ num_atom.values.inject(0){|prod, num| prod += num }
416
+ else
417
+ num_atom[type]
418
+ end
419
+ end
420
+
421
+ ##
422
+ #
423
+ # Return the number of carbons.
424
+ #
425
+ def num_carbon
426
+ @num_carbon ||= total_atoms :C
427
+ end
428
+
429
+ def num_hydrogen
430
+ @num_hydrogen ||= total_atoms :H
431
+ end
432
+
433
+ ##
434
+ #
435
+ # Return the number of nitrogens.
436
+ #
437
+ def num_nitro
438
+ @num_nitro ||= total_atoms :N
439
+ end
440
+
441
+ ##
442
+ #
443
+ # Return the number of oxygens.
444
+ #
445
+ def num_oxygen
446
+ @num_oxygen ||= total_atoms :O
447
+ end
448
+
449
+ ##
450
+ #
451
+ # Return the number of sulphurs.
452
+ #
453
+ def num_sulphur
454
+ @num_sulphur ||= total_atoms :S
455
+ end
456
+
457
+ ##
458
+ #
459
+ # Calculate molecular weight of an AA sequence.
460
+ #
461
+ # _Protein Mw is calculated by the addition of average isotopic masses of
462
+ # amino acids in the protein and the average isotopic mass of one water
463
+ # molecule._
464
+ #
465
+ def molecular_weight
466
+ @mw ||= begin
467
+ mass = WATER_MASS
468
+ each_aa do |aa|
469
+ mass += AVERAGE_MASS[aa.to_sym]
470
+ end
471
+ (mass * 10).floor().to_f / 10
472
+ end
473
+ end
474
+
475
+ ##
476
+ #
477
+ # Claculate theoretical pI for an AA sequence with bisect algorithm.
478
+ # pK value by Bjelqist, et al. is used to calculate pI.
479
+ #
480
+ def theoretical_pI
481
+ charges = []
482
+ residue_count().each do |residue|
483
+ charges << charge_proc(residue[:positive],
484
+ residue[:pK],
485
+ residue[:num])
486
+ end
487
+ round(solve_pI(charges), 2)
488
+ end
489
+
490
+ ##
491
+ #
492
+ # Return estimated half_life of an AA sequence.
493
+ #
494
+ # _The half-life is a prediction of the time it takes for half of the
495
+ # amount of protein in a cell to disappear after its synthesis in the
496
+ # cell. ProtParam relies on the "N-end rule", which relates the half-life
497
+ # of a protein to the identity of its N-terminal residue; the prediction
498
+ # is given for 3 model organisms (human, yeast and E.coli)._
499
+ #
500
+ def half_life(species=nil)
501
+ n_end = @seq[0].chr.to_sym
502
+ if species
503
+ HALFLIFE[species][n_end]
504
+ else
505
+ {
506
+ :ecoli => HALFLIFE[:ecoli][n_end],
507
+ :mammalian => HALFLIFE[:mammalian][n_end],
508
+ :yeast => HALFLIFE[:yeast][n_end]
509
+ }
510
+ end
511
+ end
512
+
513
+ ##
514
+ #
515
+ # Calculate instability index of an AA sequence.
516
+ #
517
+ # _The instability index provides an estimate of the stability of your
518
+ # protein in a test tube. Statistical analysis of 12 unstable and 32
519
+ # stable proteins has revealed [7] that there are certain dipeptides, the
520
+ # occurence of which is significantly different in the unstable proteins
521
+ # compared with those in the stable ones. The authors of this method have
522
+ # assigned a weight value of instability to each of the 400 different
523
+ # dipeptides (DIWV)._
524
+ #
525
+ def instability_index
526
+ @instability_index ||=
527
+ begin
528
+ instability_sum = 0.0
529
+ i = 0
530
+ while @seq[i+1] != nil
531
+ aa, next_aa = [@seq[i].chr.to_sym, @seq[i+1].chr.to_sym]
532
+ if DIWV.key?(aa) && DIWV[aa].key?(next_aa)
533
+ instability_sum += DIWV[aa][next_aa]
534
+ end
535
+ i += 1
536
+ end
537
+ round((10.0/amino_acid_number.to_f) * instability_sum, 2)
538
+ end
539
+ end
540
+
541
+ ##
542
+ #
543
+ # Return wheter the sequence is stable or not as String (stable/unstable).
544
+ #
545
+ # _Protein whose instability index is smaller than 40 is predicted as
546
+ # stable, a value above 40 predicts that the protein may be unstable._
547
+ #
548
+ #
549
+ def stability
550
+ (instability_index <= 40) ? "stable" : "unstable"
551
+ end
552
+
553
+ ##
554
+ #
555
+ # Return true if the sequence is stable.
556
+ #
557
+ def stable?
558
+ (instability_index <= 40) ? true : false
559
+ end
560
+
561
+ ##
562
+ #
563
+ # Calculate aliphatic index of an AA sequence.
564
+ #
565
+ # _The aliphatic index of a protein is defined as the relative volume
566
+ # occupied by aliphatic side chains (alanine, valine, isoleucine, and
567
+ # leucine). It may be regarded as a positive factor for the increase of
568
+ # thermostability of globular proteins._
569
+ #
570
+ def aliphatic_index
571
+ aa_map = aa_comp_map
572
+ @aliphatic_index ||= round(aa_map[:A] +
573
+ 2.9 * aa_map[:V] +
574
+ (3.9 * (aa_map[:I] + aa_map[:L])), 2)
575
+ end
576
+
577
+ ##
578
+ #
579
+ # Calculate GRAVY score of an AA sequence.
580
+ #
581
+ # _The GRAVY(Grand Average of Hydropathy) value for a peptide or protein
582
+ # is calculated as the sum of hydropathy values [9] of all the amino acids,
583
+ # divided by the number of residues in the sequence._
584
+ #
585
+ def gravy
586
+ @gravy ||= begin
587
+ hydropathy_sum = 0.0
588
+ each_aa do |aa|
589
+ hydropathy_sum += HYDROPATHY[aa]
590
+ end
591
+ round(hydropathy_sum / @seq.length.to_f, 3)
592
+ end
593
+ end
594
+
595
+ ##
596
+ #
597
+ # Calculate the percentage composition of an AA sequence as a Hash object.
598
+ # It return percentage of a given amino acid if aa_code is not nil.
599
+ #
600
+ def aa_comp(aa_code=nil)
601
+ if aa_code.nil?
602
+ aa_map = {}
603
+ IUPAC_CODE.keys.each do |k|
604
+ aa_map[k] = 0.0
605
+ end
606
+ aa_map.update(aa_comp_map){|k,_,v| round(v, 1) }
607
+ else
608
+ round(aa_comp_map[aa_code], 1)
609
+ end
610
+ end
611
+
612
+ private
613
+
614
+ def aa_comp_map
615
+ @aa_comp_map ||=
616
+ begin
617
+ aa_map = {}
618
+ aa_comp = {}
619
+ sum = 0
620
+ each_aa do |aa|
621
+ if aa_map.key? aa
622
+ aa_map[aa] += 1
623
+ else
624
+ aa_map[aa] = 1
625
+ end
626
+ sum += 1
627
+ end
628
+ aa_map.each {|aa, count| aa_comp[aa] = (Rational(count,sum) * 100).to_f }
629
+ aa_comp
630
+ end
631
+ end
632
+
633
+ def each_aa
634
+ @seq.each_byte do |x|
635
+ yield x.chr.to_sym
636
+ end
637
+ end
638
+
639
+ def positive? residue
640
+ (residue == "H" || residue == "R" || residue == "K")
641
+ end
642
+
643
+ #
644
+ # Return proc calculating charge of a residue.
645
+ #
646
+ def charge_proc positive, pK, num
647
+ if positive
648
+ lambda {|ph|
649
+ num.to_f / (1.0 + 10.0 ** (ph - pK))
650
+ }
651
+ else
652
+ lambda {|ph|
653
+ (-1.0 * num.to_f) / (1.0 + 10.0 ** (pK - ph))
654
+ }
655
+ end
656
+ end
657
+
658
+ #
659
+ # Transform AA sequence into residue count
660
+ #
661
+ def residue_count
662
+ counted = []
663
+ # N-terminal
664
+ n_term = @seq[0].chr
665
+ if PK[:nterm].key? n_term.to_sym
666
+ counted << {
667
+ :num => 1,
668
+ :residue => n_term.to_sym,
669
+ :pK => PK[:nterm][n_term.to_sym],
670
+ :positive => positive?(n_term)
671
+ }
672
+ elsif PK[:normal].key? n_term.to_sym
673
+ counted << {
674
+ :num => 1,
675
+ :residue => n_term.to_sym,
676
+ :pK => PK[:normal][n_term.to_sym],
677
+ :positive => positive?(n_term)
678
+ }
679
+ end
680
+ # Internal
681
+ tmp_internal = {}
682
+ @seq[1,(@seq.length-2)].each_byte do |x|
683
+ aa = x.chr.to_sym
684
+ if PK[:internal].key? aa
685
+ if tmp_internal.key? aa
686
+ tmp_internal[aa][:num] += 1
687
+ else
688
+ tmp_internal[aa] = {
689
+ :num => 1,
690
+ :residue => aa,
691
+ :pK => PK[:internal][aa],
692
+ :positive => positive?(aa.to_s)
693
+ }
694
+ end
695
+ end
696
+ end
697
+ tmp_internal.each do |aa, val|
698
+ counted << val
699
+ end
700
+ # C-terminal
701
+ c_term = @seq[-1].chr
702
+ if PK[:cterm].key? c_term.to_sym
703
+ counted << {
704
+ :num => 1,
705
+ :residue => c_term.to_sym,
706
+ :pK => PK[:cterm][c_term.to_sym],
707
+ :positive => positive?(c_term)
708
+ }
709
+ end
710
+ counted
711
+ end
712
+
713
+ #
714
+ # Solving pI value with bisect algorithm.
715
+ #
716
+ def solve_pI charges
717
+ state = {
718
+ :ph => 0.0,
719
+ :charges => charges,
720
+ :pI => nil,
721
+ :ph_prev => 0.0,
722
+ :ph_next => 14.0,
723
+ :net_charge => 0.0
724
+ }
725
+ error = false
726
+ # epsilon means precision [pI = pH +_ E]
727
+ epsilon = 0.001
728
+
729
+ loop do
730
+ # Reset net charge
731
+ state[:net_charge] = 0.0
732
+ # Calculate net charge
733
+ state[:charges].each do |charge_proc|
734
+ state[:net_charge] += charge_proc.call state[:ph]
735
+ end
736
+
737
+ # Something is wrong - pH is higher than 14
738
+ if state[:ph] >= 14.0
739
+ error = true
740
+ break
741
+ end
742
+
743
+ # Making decision
744
+ temp_ph = 0.0
745
+ if state[:net_charge] <= 0.0
746
+ temp_ph = state[:ph]
747
+ state[:ph] = state[:ph] - ((state[:ph] - state[:ph_prev]) / 2.0)
748
+ state[:ph_next] = temp_ph
749
+ else
750
+ temp_ph = state[:ph]
751
+ state[:ph] = state[:ph] + ((state[:ph_next] - state[:ph]) / 2.0)
752
+ state[:ph_prev] = temp_ph
753
+ end
754
+
755
+ if (state[:ph] - state[:ph_prev] < epsilon) &&
756
+ (state[:ph_next] - state[:ph] < epsilon)
757
+ state[:pI] = state[:ph]
758
+ break
759
+ end
760
+ end
761
+
762
+ if !state[:pI].nil? && !error
763
+ state[:pI]
764
+ else
765
+ raise "Failed to Calc pI: pH is higher than 14"
766
+ end
767
+ end
768
+
769
+ def round(num, ndigits=0)
770
+ (num * (10 ** ndigits)).round().to_f / (10 ** ndigits).to_f
771
+ end
772
+
773
+ # --------------------------------
774
+ # :section: References
775
+ #
776
+ #
777
+ # 1. Protein Identification and Analysis Tools on the ExPASy Server;
778
+ # Gasteiger E., Hoogland C., Gattiker A., Duvaud S., Wilkins M.R.,
779
+ # Appel R.D., Bairoch A.; (In) John M. Walker (ed): The Proteomics
780
+ # Protocols Handbook, Humana Press (2005). pp. 571-607
781
+ # 2. Pace, C.N., Vajdos, F., Fee, L., Grimsley, G., and Gray, T. (1995)
782
+ # How to measure and predict the molar absorption coefficient of a
783
+ # protein. Protein Sci. 11, 2411-2423.
784
+ # 3. Edelhoch, H. (1967) Spectroscopic determination of tryptophan and
785
+ # tyrosine in proteins. Biochemistry 6, 1948-1954.
786
+ # 4. Gill, S.C. and von Hippel, P.H. (1989) Calculation of protein
787
+ # extinction coefficients from amino acid sequence data. Anal. Biochem.
788
+ # 182:319-326(1989).
789
+ # 5. Bachmair, A., Finley, D. and Varshavsky, A. (1986) In vivo half-life
790
+ # of a protein is a function of its amino-terminal residue. Science 234,
791
+ # 179-186.
792
+ # 6. Gonda, D.K., Bachmair, A., Wunning, I., Tobias, J.W., Lane, W.S. and
793
+ # Varshavsky, A. J. (1989) Universality and structure of the N-end rule.
794
+ # J. Biol. Chem. 264, 16700-16712.
795
+ # 7. Tobias, J.W., Shrader, T.E., Rocap, G. and Varshavsky, A. (1991) The
796
+ # N-end rule in bacteria. Science 254, 1374-1377.
797
+ # 8. Ciechanover, A. and Schwartz, A.L. (1989) How are substrates
798
+ # recognized by the ubiquitin-mediated proteolytic system? Trends Biochem.
799
+ # Sci. 14, 483-488.
800
+ # 9. Varshavsky, A. (1997) The N-end rule pathway of protein degradation.
801
+ # Genes Cells 2, 13-28.
802
+ # 10. Guruprasad, K., Reddy, B.V.B. and Pandit, M.W. (1990) Correlation
803
+ # between stability of a protein and its dipeptide composition: a novel
804
+ # approach for predicting in vivo stability of a protein from its primary
805
+ # sequence. Protein Eng. 4,155-161.
806
+ # 11. Ikai, A.J. (1980) Thermostability and aliphatic index of globular
807
+ # proteins. J. Biochem. 88, 1895-1898.
808
+ # 12. Kyte, J. and Doolittle, R.F. (1982) A simple method for displaying
809
+ # the hydropathic character of a protein. J. Mol. Biol. 157, 105-132.
810
+ # 13. Bjellqvist, B.,Hughes, G.J., Pasquali, Ch., Paquet, N., Ravier, F.,
811
+ # Sanchez, J.-Ch., Frutiger, S. & Hochstrasser, D.F. The focusing positions
812
+ # of polypeptides in immobilized pH gradients can be predicted from their
813
+ # amino acid sequences. Electrophoresis 1993, 14, 1023-1031.
814
+ #
815
+ # --------------------------------
816
+ end
817
+ end