bio-protparam 0.1.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -0,0 +1,5 @@
1
+ lib/**/*.rb
2
+ bin/*
3
+ -
4
+ features/**/*.feature
5
+ LICENSE.txt
@@ -0,0 +1,12 @@
1
+ language: ruby
2
+ rvm:
3
+ - 1.9.2
4
+ - 1.9.3
5
+ - jruby-19mode # JRuby in 1.9 mode
6
+ - rbx-19mode
7
+ # - 1.8.7
8
+ # - jruby-18mode # JRuby in 1.8 mode
9
+ # - rbx-18mode
10
+
11
+ # uncomment this line if your project needs to run something other than `rake`:
12
+ # script: bundle exec rspec spec
data/Gemfile ADDED
@@ -0,0 +1,9 @@
1
+ source "http://rubygems.org"
2
+
3
+ gem "bio", ">= 1.4.2"
4
+
5
+ group :development, :test do
6
+ gem "minitest", ">= 0"
7
+ gem "rdoc", "~> 3.12"
8
+ gem "jeweler", "~> 1.8.4"
9
+ end
@@ -0,0 +1,20 @@
1
+ Copyright (c) 2012 hryk <hiroyuki@1vq9.com>
2
+
3
+ Permission is hereby granted, free of charge, to any person obtaining
4
+ a copy of this software and associated documentation files (the
5
+ "Software"), to deal in the Software without restriction, including
6
+ without limitation the rights to use, copy, modify, merge, publish,
7
+ distribute, sublicense, and/or sell copies of the Software, and to
8
+ permit persons to whom the Software is furnished to do so, subject to
9
+ the following conditions:
10
+
11
+ The above copyright notice and this permission notice shall be
12
+ included in all copies or substantial portions of the Software.
13
+
14
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
15
+ EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
16
+ MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
17
+ NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE
18
+ LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION
19
+ OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION
20
+ WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
@@ -0,0 +1,47 @@
1
+ # bio-protparam
2
+
3
+ [![Build Status](https://secure.travis-ci.org/hryk/bioruby-protparam.png)](http://travis-ci.org/hryk/bioruby-protparam)
4
+
5
+ `bio-protparam` adds Bio::Protparam class. Bio::Protparam has same interface and
6
+ function as Bio::Tools::Protparam class of BioPerl, except that it calculate
7
+ parameters instead of throwing query to Expasy protparam tool.
8
+
9
+ **Note: this software is under active development!**
10
+
11
+ ## Installation
12
+
13
+ ```sh
14
+ gem install bio-protparam
15
+ ```
16
+
17
+ ## Usage
18
+
19
+ ```ruby
20
+ require 'bio-protparam'
21
+
22
+ protparam = Bio::Protparam.new("MYNNYNLCHIRTINWEEIITGPSAMYSYVY...")
23
+ # Return Mw
24
+ protparam.molecular_weight
25
+ # Return pI
26
+ protparam.theorettical_pI
27
+
28
+ ```
29
+
30
+ The API doc is on [rdoc.info](http://rdoc.info/github/hryk/bioruby-protparam/). For
31
+ more code examples see the test files in the source tree.
32
+
33
+ ## Cite
34
+
35
+ If you use this software, please cite one of
36
+
37
+ * [BioRuby: bioinformatics software for the Ruby programming language](http://dx.doi.org/10.1093/bioinformatics/btq475)
38
+ * [Biogem: an effective tool-based approach for scaling up open source software development in bioinformatics](http://dx.doi.org/10.1093/bioinformatics/bts080)
39
+
40
+ ## Biogems.info
41
+
42
+ This Biogem is published at [#bio-protparam](http://biogems.info/index.html)
43
+
44
+ ## Copyright
45
+
46
+ Copyright (c) 2012 hryk. See LICENSE.txt for further details.
47
+
@@ -0,0 +1,48 @@
1
+ = bio-protparam
2
+
3
+ {<img
4
+ src="https://secure.travis-ci.org/hryk/bioruby-protparam.png"
5
+ />}[http://travis-ci.org/#!/hryk/bioruby-protparam]
6
+
7
+ Full description goes here
8
+
9
+ Note: this software is under active development!
10
+
11
+ == Installation
12
+
13
+ gem install bio-protparam
14
+
15
+ == Usage
16
+
17
+ == Developers
18
+
19
+ To use the library
20
+
21
+ require 'bio-protparam'
22
+
23
+ The API doc is online. For more code examples see also the test files in
24
+ the source tree.
25
+
26
+ == Project home page
27
+
28
+ Information on the source tree, documentation, issues and how to contribute, see
29
+
30
+ http://github.com/hryk/bioruby-protparam
31
+
32
+ The BioRuby community is on IRC server: irc.freenode.org, channel: #bioruby.
33
+
34
+ == Cite
35
+
36
+ If you use this software, please cite one of
37
+
38
+ * [BioRuby: bioinformatics software for the Ruby programming language](http://dx.doi.org/10.1093/bioinformatics/btq475)
39
+ * [Biogem: an effective tool-based approach for scaling up open source software development in bioinformatics](http://dx.doi.org/10.1093/bioinformatics/bts080)
40
+
41
+ == Biogems.info
42
+
43
+ This Biogem is published at http://biogems.info/index.html#bio-protparam
44
+
45
+ == Copyright
46
+
47
+ Copyright (c) 2012 hryk. See LICENSE.txt for further details.
48
+
@@ -0,0 +1,42 @@
1
+ # encoding: utf-8
2
+
3
+ require 'rubygems'
4
+ require 'bundler'
5
+ begin
6
+ Bundler.setup(:default, :development)
7
+ rescue Bundler::BundlerError => e
8
+ $stderr.puts e.message
9
+ $stderr.puts "Run `bundle install` to install missing gems"
10
+ exit e.status_code
11
+ end
12
+ require 'rake'
13
+
14
+ require 'jeweler'
15
+ Jeweler::Tasks.new do |gem|
16
+ gem.name = "bio-protparam"
17
+ gem.homepage = "http://github.com/hryk/bioruby-protparam"
18
+ gem.license = "MIT"
19
+ gem.summary = %Q{A Protparam compatible utility for BioRuby.}
20
+ gem.description = %Q{Bio::Protparam has same interface and function as Bio::Tools::Protparam class of BioPerl, except that it calculate parameters instead of throwing query to Expasy protparam tool.}
21
+ gem.email = "hiroyuki@1vq9.com"
22
+ gem.authors = ["hryk"]
23
+ end
24
+ Jeweler::RubygemsDotOrgTasks.new
25
+
26
+ require 'rake/testtask'
27
+ Rake::TestTask.new(:test) do |test|
28
+ test.libs << 'lib' << 'test'
29
+ test.pattern = 'test/**/test_*.rb'
30
+ test.verbose = true
31
+ end
32
+
33
+ task :default => :test
34
+
35
+ require 'rdoc/task'
36
+ Rake::RDocTask.new do |rdoc|
37
+ version = File.exist?('VERSION') ? File.read('VERSION') : ""
38
+ rdoc.rdoc_dir = 'rdoc'
39
+ rdoc.title = "bio-protparam #{version}"
40
+ rdoc.rdoc_files.include('README*')
41
+ rdoc.rdoc_files.include('lib/**/*.rb')
42
+ end
data/VERSION ADDED
@@ -0,0 +1 @@
1
+ 0.1.0
@@ -0,0 +1,12 @@
1
+ # Please require your code below, respecting the naming conventions in the
2
+ # bioruby directory tree.
3
+ #
4
+ # For example, say you have a plugin named bio-plugin, the only uncommented
5
+ # line in this file would be
6
+ #
7
+ # require 'bio/bio-plugin/plugin'
8
+ #
9
+ # In this file only require other files. Avoid other source code.
10
+
11
+ require 'bio/util/protparam'
12
+
@@ -0,0 +1,817 @@
1
+ # encoding: utf-8
2
+ #
3
+ #
4
+ # = bio/appl/protparam.rb - A Class to Calculate Protein Parameters.
5
+ #
6
+ # Copyright:: Copyright (C) 2012
7
+ # Hiroyuki Nakamura <hiroyuki@1vq9.com>
8
+ # License:: The Ruby License
9
+ #
10
+ require 'rational'
11
+
12
+ module Bio
13
+ ##
14
+ # == Description
15
+ #
16
+ # Bio::Protparam is a class for calculating protein paramesters. This class
17
+ # has a similer interface to BioPerl's Bio::Tools::Protparam. However, it
18
+ # calculate parameters instead of throwing a query to Expasy's {Protparam
19
+ # tool}[http://web.expasy.org/protparam/]{[1]}[rdoc-label:1] as Bio::Tools::Protparam does.
20
+ #
21
+ class Protparam
22
+
23
+ # {IUPAC codes}[http://www.bioinformatics.org/sms2/iupac.html] for amino acids.
24
+ IUPAC_CODE = {
25
+ :I => "Ile",
26
+ :V => "Val",
27
+ :L => "Leu",
28
+ :F => "Phe",
29
+ :C => "Cys",
30
+ :M => "Met",
31
+ :A => "Ala",
32
+ :G => "Gly",
33
+ :T => "Thr",
34
+ :W => "Trp",
35
+ :S => "Ser",
36
+ :Y => "Tyr",
37
+ :P => "Pro",
38
+ :H => "His",
39
+ :E => "Glu",
40
+ :Q => "Gln",
41
+ :D => "Asp",
42
+ :N => "Asn",
43
+ :K => "Lys",
44
+ :R => "Arg",
45
+ :U => "Sec",
46
+ :O => "Pyl",
47
+ :B => "Asx",
48
+ :Z => "Glx",
49
+ :X => "Xaa"
50
+ }
51
+
52
+ # Dipeptide instability weight value for calculating instability index of proteins {[10]}[rdoc-label:10].
53
+ DIWV = {
54
+ :W => {
55
+ :W => 1.0, :C => 1.0, :M => 24.68, :H => 24.68, :Y => 1.0, :F => 1.0, :Q => 1.0,
56
+ :N => 13.34, :I => 1.0, :R => 1.0, :D => 1.0, :P => 1.0, :T => -14.03, :K => 1.0,
57
+ :E => 1.0, :V => -7.49, :S => 1.0, :G => -9.37, :A => -14.03, :L => 13.34
58
+ },
59
+ :C => {
60
+ :W => 24.68, :C => 1.0, :M => 33.6, :H => 33.6, :Y => 1.0, :F => 1.0, :Q => -6.54, :N => 1.0,
61
+ :I => 1.0, :R => 1.0, :D => 20.26, :P => 20.26, :T => 33.6, :K => 1.0, :E => 1.0, :V => -6.54,
62
+ :S => 1.0, :G => 1.0, :A => 1.0, :L => 20.26
63
+ },
64
+ :M => {
65
+ :W => 1.0, :C => 1.0, :M => -1.88, :H => 58.28, :Y => 24.68, :F => 1.0, :Q => -6.54,
66
+ :N => 1.0, :I => 1.0, :R => -6.54, :D => 1.0, :P => 44.94, :T => -1.88, :K => 1.0, :E => 1.0,
67
+ :V => 1.0, :S => 44.94, :G => 1.0, :A => 13.34, :L => 1.0
68
+ },
69
+ :H => {
70
+ :W => -1.88, :C => 1.0, :M => 1.0, :H => 1.0, :Y => 44.94, :F => -9.37, :Q => 1.0,
71
+ :N => 24.68, :I => 44.94, :R => 1.0, :D => 1.0, :P => -1.88, :T => -6.54, :K => 24.68,
72
+ :E => 1.0, :V => 1.0, :S => 1.0, :G => -9.37, :A => 1.0, :L => 1.0
73
+ },
74
+ :Y => {
75
+ :W => -9.37, :C => 1.0, :M => 44.94, :H => 13.34, :Y => 13.34, :F => 1.0, :Q => 1.0,
76
+ :N => 1.0, :I => 1.0, :R => -15.91, :D => 24.68, :P => 13.34, :T => -7.49, :K => 1.0,
77
+ :E => -6.54, :V => 1.0, :S => 1.0, :G => -7.49, :A => 24.68, :L => 1.0
78
+ },
79
+ :F => {
80
+ :W => 1.0, :C => 1.0, :M => 1.0, :H => 1.0, :Y => 33.6, :F => 1.0, :Q => 1.0, :N => 1.0,
81
+ :I => 1.0, :R => 1.0, :D => 13.34, :P => 20.26, :T => 1.0, :K => -14.03, :E => 1.0,
82
+ :V => 1.0, :S => 1.0, :G => 1.0, :A => 1.0, :L => 1.0
83
+ },
84
+ :Q => {
85
+ :W => 1.0, :C => -6.54, :M => 1.0, :H => 1.0, :Y => -6.54, :F => -6.54, :Q => 20.26,
86
+ :N => 1.0, :I => 1.0, :R => 1.0, :D => 20.26, :P => 20.26, :T => 1.0, :K => 1.0, :E => 20.26,
87
+ :V => -6.54, :S => 44.94, :G => 1.0, :A => 1.0, :L => 1.0
88
+ },
89
+ :N => {
90
+ :W => -9.37, :C => -1.88, :M => 1.0, :H => 1.0, :Y => 1.0, :F => -14.03, :Q => -6.54,
91
+ :N => 1.0, :I => 44.94, :R => 1.0, :D => 1.0, :P => -1.88, :T => -7.49, :K => 24.68,
92
+ :E => 1.0, :V => 1.0, :S => 1.0, :G => -14.03, :A => 1.0, :L => 1.0
93
+ },
94
+ :I => {
95
+ :W => 1.0, :C => 1.0, :M => 1.0, :H => 13.34, :Y => 1.0, :F => 1.0, :Q => 1.0, :N => 1.0,
96
+ :I => 1.0, :R => 1.0, :D => 1.0, :P => -1.88, :T => 1.0, :K => -7.49, :E => 44.94,
97
+ :V => -7.49, :S => 1.0, :G => 1.0, :A => 1.0, :L => 20.26
98
+ },
99
+ :R => {
100
+ :W => 58.28, :C => 1.0, :M => 1.0, :H => 20.26, :Y => -6.54, :F => 1.0, :Q => 20.26,
101
+ :N => 13.34, :I => 1.0, :R => 58.28, :D => 1.0, :P => 20.26, :T => 1.0, :K => 1.0, :E => 1.0,
102
+ :V => 1.0, :S => 44.94, :G => -7.49, :A => 1.0, :L => 1.0
103
+ },
104
+ :D => {
105
+ :W => 1.0, :C => 1.0, :M => 1.0, :H => 1.0, :Y => 1.0, :F => -6.54, :Q => 1.0, :N => 1.0,
106
+ :I => 1.0, :R => -6.54, :D => 1.0, :P => 1.0, :T => -14.03, :K => -7.49, :E => 1.0,
107
+ :V => 1.0, :S => 20.26, :G => 1.0, :A => 1.0, :L => 1.0
108
+ },
109
+ :P => {
110
+ :W => -1.88, :C => -6.54, :M => -6.54, :H => 1.0, :Y => 1.0, :F => 20.26, :Q => 20.26,
111
+ :N => 1.0, :I => 1.0, :R => -6.54, :D => -6.54, :P => 20.26, :T => 1.0, :K => 1.0, :E => 18.38,
112
+ :V => 20.26, :S => 20.26, :G => 1.0, :A => 20.26, :L => 1.0
113
+ },
114
+ :T => {
115
+ :W => -14.03, :C => 1.0, :M => 1.0, :H => 1.0, :Y => 1.0, :F => 13.34, :Q => -6.54,
116
+ :N => -14.03, :I => 1.0, :R => 1.0, :D => 1.0, :P => 1.0, :T => 1.0, :K => 1.0, :E => 20.26,
117
+ :V => 1.0, :S => 1.0, :G => -7.49, :A => 1.0, :L => 1.0
118
+ },
119
+ :K => {
120
+ :W => 1.0, :C => 1.0, :M => 33.6, :H => 1.0, :Y => 1.0, :F => 1.0, :Q => 24.68, :N => 1.0,
121
+ :I => -7.49, :R => 33.6, :D => 1.0, :P => -6.54, :T => 1.0, :K => 1.0, :E => 1.0, :V => -7.49,
122
+ :S => 1.0, :G => -7.49, :A => 1.0, :L => -7.49
123
+ },
124
+ :E => {
125
+ :W => -14.03, :C => 44.94, :M => 1.0, :H => -6.54, :Y => 1.0, :F => 1.0, :Q => 20.26,
126
+ :N => 1.0, :I => 20.26, :R => 1.0, :D => 20.26, :P => 20.26, :T => 1.0, :K => 1.0, :E => 33.6,
127
+ :V => 1.0, :S => 20.26, :G => 1.0, :A => 1.0, :L => 1.0
128
+ },
129
+ :V => {
130
+ :W => 1.0, :C => 1.0, :M => 1.0, :H => 1.0, :Y => -6.54, :F => 1.0, :Q => 1.0, :N => 1.0,
131
+ :I => 1.0, :R => 1.0, :D => -14.03, :P => 20.26, :T => -7.49, :K => -1.88, :E => 1.0,
132
+ :V => 1.0, :S => 1.0, :G => -7.49, :A => 1.0, :L => 1.0
133
+ },
134
+ :S => {
135
+ :W => 1.0, :C => 33.6, :M => 1.0, :H => 1.0, :Y => 1.0, :F => 1.0, :Q => 20.26, :N => 1.0,
136
+ :I => 1.0, :R => 20.26, :D => 1.0, :P => 44.94, :T => 1.0, :K => 1.0, :E => 20.26, :V => 1.0,
137
+ :S => 20.26, :G => 1.0, :A => 1.0, :L => 1.0
138
+ },
139
+ :G => {
140
+ :W => 13.34, :C => 1.0, :M => 1.0, :H => 1.0, :Y => -7.49, :F => 1.0, :Q => 1.0, :N => -7.49,
141
+ :I => -7.49, :R => 1.0, :D => 1.0, :P => 1.0, :T => -7.49, :K => -7.49, :E => -6.54,
142
+ :V => 1.0, :S => 1.0, :G => 13.34, :A => -7.49, :L => 1.0
143
+ },
144
+ :A => {
145
+ :W => 1.0, :C => 44.94, :M => 1.0, :H => -7.49, :Y => 1.0, :F => 1.0, :Q => 1.0, :N => 1.0,
146
+ :I => 1.0, :R => 1.0, :D => -7.49, :P => 20.26, :T => 1.0, :K => 1.0, :E => 1.0, :V => 1.0,
147
+ :S => 1.0, :G => 1.0, :A => 1.0, :L => 1.0
148
+ },
149
+ :L => {
150
+ :W => 24.68, :C => 1.0, :M => 1.0, :H => 1.0, :Y => 1.0, :F => 1.0, :Q => 33.6, :N => 1.0,
151
+ :I => 1.0, :R => 20.26, :D => 1.0, :P => 20.26, :T => 1.0, :K => -7.49, :E => 1.0, :V => 1.0,
152
+ :S => 1.0, :G => 1.0, :A => 1.0, :L => 1.0
153
+ }
154
+ }
155
+
156
+ # Estemated half-life of N-terminal residue of a protein.
157
+ HALFLIFE = {
158
+ :ecoli => {
159
+ :I => 600,
160
+ :V => 600,
161
+ :L => 2,
162
+ :F => 2,
163
+ :C => 600,
164
+ :M => 600,
165
+ :A => 600,
166
+ :G => 600,
167
+ :T => 600,
168
+ :W => 2,
169
+ :S => 600,
170
+ :Y => 2,
171
+ :P => 600,
172
+ :H => 600,
173
+ :E => 600,
174
+ :Q => 600,
175
+ :D => 600,
176
+ :N => 600,
177
+ :K => 2,
178
+ :R => 2,
179
+ :U => 600
180
+ },
181
+ :mammalian => {
182
+ :A => 264,
183
+ :R => 60,
184
+ :N => 84,
185
+ :D => 66,
186
+ :C => 72,
187
+ :Q => 48,
188
+ :E => 60,
189
+ :G => 30,
190
+ :H => 210,
191
+ :I => 1200,
192
+ :L => 330,
193
+ :K => 78,
194
+ :M => 1800,
195
+ :F => 66,
196
+ :P => 1200,
197
+ :S => 114,
198
+ :T => 432,
199
+ :W => 168,
200
+ :Y => 168,
201
+ :V => 6000
202
+ },
203
+ :yeast => {
204
+ :A => 1200,
205
+ :R => 2,
206
+ :N => 3,
207
+ :D => 3,
208
+ :C => 1200,
209
+ :Q => 10,
210
+ :E => 30,
211
+ :G => 1200,
212
+ :H => 10,
213
+ :I => 30,
214
+ :L => 3,
215
+ :K => 3,
216
+ :M => 1200,
217
+ :F => 3,
218
+ :P => 1200,
219
+ :S => 1200,
220
+ :T => 1200,
221
+ :W => 3,
222
+ :Y => 10,
223
+ :V => 1200
224
+ }
225
+ }
226
+
227
+ ## TOP-IDP
228
+ ##
229
+ ## http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2676888/
230
+ ##
231
+ # TOP_IDP = {
232
+ # :I => -0.486,
233
+ # :V => -0.121,
234
+ # :L => -0.326,
235
+ # :F => -0.697,
236
+ # :C => 0.02,
237
+ # :M => -0.397,
238
+ # :A => 0.06,
239
+ # :G => 0.166,
240
+ # :T => 0.059,
241
+ # :W => -0.884,
242
+ # :S => 0.341,
243
+ # :Y => -0.510,
244
+ # :P => 0.987,
245
+ # :H => 0.303,
246
+ # :E => 0.736,
247
+ # :Q => 0.318,
248
+ # :D => 0.192,
249
+ # :N => 0.007,
250
+ # :K => 0.586,
251
+ # :R => 0.180,
252
+ # :U => 0.02
253
+ # }
254
+
255
+ # Hydropathy values for amino acids {[12]}[rdoc-label:12].
256
+ HYDROPATHY = {
257
+ :I => 4.5 ,
258
+ :V => 4.2 ,
259
+ :L => 3.8 ,
260
+ :F => 2.8 ,
261
+ :C => 2.5 ,
262
+ :M => 1.9 ,
263
+ :A => 1.8 ,
264
+ :G => -0.4,
265
+ :T => -0.7,
266
+ :W => -0.9,
267
+ :S => -0.8,
268
+ :Y => -1.3,
269
+ :P => -1.6,
270
+ :H => -3.2,
271
+ :E => -3.5,
272
+ :Q => -3.5,
273
+ :D => -3.5,
274
+ :N => -3.5,
275
+ :K => -3.9,
276
+ :R => -4.5,
277
+ :U => 2.5
278
+ }
279
+
280
+ # {Average isotopic masses of amino acids}[http://web.expasy.org/findmod/findmod_masses.html#AA]
281
+ AVERAGE_MASS = {
282
+ :I => 113.1594,
283
+ :V => 99.1326,
284
+ :L => 113.1594,
285
+ :F => 147.1766,
286
+ :C => 103.1388,
287
+ :M => 131.1926,
288
+ :A => 71.0788,
289
+ :G => 57.0519,
290
+ :T => 101.1051,
291
+ :W => 186.2132,
292
+ :S => 87.0782,
293
+ :Y => 163.1760,
294
+ :P => 97.1167,
295
+ :H => 137.1411,
296
+ :E => 129.1155,
297
+ :Q => 128.1307,
298
+ :D => 115.0886,
299
+ :N => 114.1038,
300
+ :K => 128.1741,
301
+ :R => 156.1875,
302
+ :U => 150.0388
303
+ }
304
+ WATER_MASS = 18.01524
305
+
306
+ # Atomic composition of amino acids.
307
+ ATOM = {
308
+ :I => {:C => 6, :H => 13, :O => 2, :N => 1, :S => 0}, # C6H13NO2
309
+ :V => {:C => 5, :H => 11, :O => 2, :N => 1, :S => 0}, # C5H11NO2
310
+ :L => {:C => 6, :H => 13, :O => 2, :N => 1, :S => 0}, # C6H13NO2
311
+ :F => {:C => 9, :H => 11, :O => 2, :N => 1, :S => 0}, # C9H11NO2
312
+ :C => {:C => 3, :H => 7 , :O => 2, :N => 1, :S => 1}, # C3H7NO2S
313
+ :M => {:C => 5, :H => 11 ,:O => 2, :N => 1, :S => 1}, # C5H11NO2S
314
+ :A => {:C => 3, :H => 7 , :O => 2, :N => 1, :S => 0}, # C3H7NO2
315
+ :G => {:C => 2, :H => 5 , :O => 2, :N => 1, :S => 0}, # C2H5NO2
316
+ :T => {:C => 4, :H => 9 , :O => 3, :N => 1, :S => 0}, # C4H9NO3
317
+ :W => {:C => 11,:H => 12, :O => 2, :N => 2, :S => 0}, # C11H12N2O2
318
+ :S => {:C => 3, :H => 7 , :O => 3, :N => 1, :S => 0}, # C3H7NO3
319
+ :Y => {:C => 9, :H => 11, :O => 3, :N => 1, :S => 0}, # C9H11NO3
320
+ :P => {:C => 5, :H => 9 , :O => 2, :N => 1, :S => 0}, # C5H9NO2
321
+ :H => {:C => 6, :H => 9 , :O => 2, :N => 3, :S => 0}, # C6H9N3O2
322
+ :E => {:C => 5, :H => 9 , :O => 4, :N => 1, :S => 0}, # C5H9NO4
323
+ :Q => {:C => 5, :H => 10, :O => 3, :N => 2, :S => 0}, # C5H10N2O3
324
+ :D => {:C => 4, :H => 7 , :O => 4, :N => 1, :S => 0}, # C4H7NO4
325
+ :N => {:C => 4, :H => 8 , :O => 3, :N => 2, :S => 0}, # C4H8N2O3
326
+ :K => {:C => 6, :H => 14, :O => 2, :N => 2, :S => 0}, # C6H14N2O2
327
+ :R => {:C => 6, :H => 14, :O => 2, :N => 4, :S => 0}, # C6H14N4O2
328
+ }
329
+
330
+ ##
331
+ #
332
+ # pK value from Bjellqvist, et al {[13]}[rdoc-label:13].
333
+ # Taking into account the decrease in pK differences
334
+ # between acids and bases when going from water
335
+ # to 8 M urea, a value of 7.5 has been assigned to the
336
+ # N-terminal residue .
337
+ #
338
+ PK = {
339
+ :cterm => {
340
+ :normal => 3.55, :D => 4.55, :E => 4.75
341
+ },
342
+ :nterm => {
343
+ :A => 7.59, :M => 7.00, :S => 6.93, :P => 8.36,
344
+ :T => 6.82, :V => 7.44, :E => 7.70 , :G => 7.50
345
+ },
346
+ :internal => {
347
+ :D => 4.05, :E => 4.45, :H => 5.98, :C => 9.0,
348
+ :Y => 10.0, :K => 10.0, :R => 12.0
349
+ }
350
+ }
351
+
352
+ def initialize(seq)
353
+ if seq.kind_of?(String) && Bio::Sequence.guess(seq) == Bio::Sequence::AA
354
+ # TODO: has issue.
355
+ @seq = Bio::Sequence::AA.new seq
356
+ elsif seq.kind_of? Bio::Sequence::AA
357
+ @seq = seq
358
+ elsif seq.kind_of?(Bio::Sequence) &&
359
+ seq.guess.kind_of?(Bio::Sequence::AA)
360
+ @seq = seq.guess
361
+ else
362
+ raise ArgumentError, "sequence must be an AA sequence"
363
+ end
364
+ end
365
+
366
+ ##
367
+ #
368
+ # Return the number of negative amino acids (D and E) in an AA sequence.
369
+ #
370
+ def num_neg
371
+ @num_neg ||= @seq.count("DE")
372
+ end
373
+
374
+ ##
375
+ #
376
+ # Return the number of positive amino acids (R and K) in an AA sequence.
377
+ #
378
+ def num_pos
379
+ @num_neg ||= @seq.count("RK")
380
+ end
381
+
382
+ ##
383
+ #
384
+ # Return the number of residues in an AA sequence.
385
+ #
386
+ def amino_acid_number
387
+ @seq.length
388
+ end
389
+
390
+ ##
391
+ #
392
+ # Return the number of atoms in a sequence. If type is given, return the
393
+ # number of specific atoms in a sequence.
394
+ #
395
+ def total_atoms(type=nil)
396
+ if !type.nil?
397
+ type = type.to_sym
398
+ if /^(?:C|H|O|N|S){1}$/ !~ type.to_s
399
+ raise ArgumentError, "type must be C/H/O/N/S/nil(all)"
400
+ end
401
+ end
402
+ num_atom = {:C => 0,
403
+ :H => 0,
404
+ :O => 0,
405
+ :N => 0,
406
+ :S => 0}
407
+ each_aa do |aa|
408
+ ATOM[aa].each do |t, num|
409
+ num_atom[t] += num
410
+ end
411
+ end
412
+ num_atom[:H] = num_atom[:H] - 2 * (amino_acid_number - 1)
413
+ num_atom[:O] = num_atom[:O] - (amino_acid_number - 1)
414
+ if type.nil?
415
+ num_atom.values.inject(0){|prod, num| prod += num }
416
+ else
417
+ num_atom[type]
418
+ end
419
+ end
420
+
421
+ ##
422
+ #
423
+ # Return the number of carbons.
424
+ #
425
+ def num_carbon
426
+ @num_carbon ||= total_atoms :C
427
+ end
428
+
429
+ def num_hydrogen
430
+ @num_hydrogen ||= total_atoms :H
431
+ end
432
+
433
+ ##
434
+ #
435
+ # Return the number of nitrogens.
436
+ #
437
+ def num_nitro
438
+ @num_nitro ||= total_atoms :N
439
+ end
440
+
441
+ ##
442
+ #
443
+ # Return the number of oxygens.
444
+ #
445
+ def num_oxygen
446
+ @num_oxygen ||= total_atoms :O
447
+ end
448
+
449
+ ##
450
+ #
451
+ # Return the number of sulphurs.
452
+ #
453
+ def num_sulphur
454
+ @num_sulphur ||= total_atoms :S
455
+ end
456
+
457
+ ##
458
+ #
459
+ # Calculate molecular weight of an AA sequence.
460
+ #
461
+ # _Protein Mw is calculated by the addition of average isotopic masses of
462
+ # amino acids in the protein and the average isotopic mass of one water
463
+ # molecule._
464
+ #
465
+ def molecular_weight
466
+ @mw ||= begin
467
+ mass = WATER_MASS
468
+ each_aa do |aa|
469
+ mass += AVERAGE_MASS[aa.to_sym]
470
+ end
471
+ (mass * 10).floor().to_f / 10
472
+ end
473
+ end
474
+
475
+ ##
476
+ #
477
+ # Claculate theoretical pI for an AA sequence with bisect algorithm.
478
+ # pK value by Bjelqist, et al. is used to calculate pI.
479
+ #
480
+ def theoretical_pI
481
+ charges = []
482
+ residue_count().each do |residue|
483
+ charges << charge_proc(residue[:positive],
484
+ residue[:pK],
485
+ residue[:num])
486
+ end
487
+ round(solve_pI(charges), 2)
488
+ end
489
+
490
+ ##
491
+ #
492
+ # Return estimated half_life of an AA sequence.
493
+ #
494
+ # _The half-life is a prediction of the time it takes for half of the
495
+ # amount of protein in a cell to disappear after its synthesis in the
496
+ # cell. ProtParam relies on the "N-end rule", which relates the half-life
497
+ # of a protein to the identity of its N-terminal residue; the prediction
498
+ # is given for 3 model organisms (human, yeast and E.coli)._
499
+ #
500
+ def half_life(species=nil)
501
+ n_end = @seq[0].chr.to_sym
502
+ if species
503
+ HALFLIFE[species][n_end]
504
+ else
505
+ {
506
+ :ecoli => HALFLIFE[:ecoli][n_end],
507
+ :mammalian => HALFLIFE[:mammalian][n_end],
508
+ :yeast => HALFLIFE[:yeast][n_end]
509
+ }
510
+ end
511
+ end
512
+
513
+ ##
514
+ #
515
+ # Calculate instability index of an AA sequence.
516
+ #
517
+ # _The instability index provides an estimate of the stability of your
518
+ # protein in a test tube. Statistical analysis of 12 unstable and 32
519
+ # stable proteins has revealed [7] that there are certain dipeptides, the
520
+ # occurence of which is significantly different in the unstable proteins
521
+ # compared with those in the stable ones. The authors of this method have
522
+ # assigned a weight value of instability to each of the 400 different
523
+ # dipeptides (DIWV)._
524
+ #
525
+ def instability_index
526
+ @instability_index ||=
527
+ begin
528
+ instability_sum = 0.0
529
+ i = 0
530
+ while @seq[i+1] != nil
531
+ aa, next_aa = [@seq[i].chr.to_sym, @seq[i+1].chr.to_sym]
532
+ if DIWV.key?(aa) && DIWV[aa].key?(next_aa)
533
+ instability_sum += DIWV[aa][next_aa]
534
+ end
535
+ i += 1
536
+ end
537
+ round((10.0/amino_acid_number.to_f) * instability_sum, 2)
538
+ end
539
+ end
540
+
541
+ ##
542
+ #
543
+ # Return wheter the sequence is stable or not as String (stable/unstable).
544
+ #
545
+ # _Protein whose instability index is smaller than 40 is predicted as
546
+ # stable, a value above 40 predicts that the protein may be unstable._
547
+ #
548
+ #
549
+ def stability
550
+ (instability_index <= 40) ? "stable" : "unstable"
551
+ end
552
+
553
+ ##
554
+ #
555
+ # Return true if the sequence is stable.
556
+ #
557
+ def stable?
558
+ (instability_index <= 40) ? true : false
559
+ end
560
+
561
+ ##
562
+ #
563
+ # Calculate aliphatic index of an AA sequence.
564
+ #
565
+ # _The aliphatic index of a protein is defined as the relative volume
566
+ # occupied by aliphatic side chains (alanine, valine, isoleucine, and
567
+ # leucine). It may be regarded as a positive factor for the increase of
568
+ # thermostability of globular proteins._
569
+ #
570
+ def aliphatic_index
571
+ aa_map = aa_comp_map
572
+ @aliphatic_index ||= round(aa_map[:A] +
573
+ 2.9 * aa_map[:V] +
574
+ (3.9 * (aa_map[:I] + aa_map[:L])), 2)
575
+ end
576
+
577
+ ##
578
+ #
579
+ # Calculate GRAVY score of an AA sequence.
580
+ #
581
+ # _The GRAVY(Grand Average of Hydropathy) value for a peptide or protein
582
+ # is calculated as the sum of hydropathy values [9] of all the amino acids,
583
+ # divided by the number of residues in the sequence._
584
+ #
585
+ def gravy
586
+ @gravy ||= begin
587
+ hydropathy_sum = 0.0
588
+ each_aa do |aa|
589
+ hydropathy_sum += HYDROPATHY[aa]
590
+ end
591
+ round(hydropathy_sum / @seq.length.to_f, 3)
592
+ end
593
+ end
594
+
595
+ ##
596
+ #
597
+ # Calculate the percentage composition of an AA sequence as a Hash object.
598
+ # It return percentage of a given amino acid if aa_code is not nil.
599
+ #
600
+ def aa_comp(aa_code=nil)
601
+ if aa_code.nil?
602
+ aa_map = {}
603
+ IUPAC_CODE.keys.each do |k|
604
+ aa_map[k] = 0.0
605
+ end
606
+ aa_map.update(aa_comp_map){|k,_,v| round(v, 1) }
607
+ else
608
+ round(aa_comp_map[aa_code], 1)
609
+ end
610
+ end
611
+
612
+ private
613
+
614
+ def aa_comp_map
615
+ @aa_comp_map ||=
616
+ begin
617
+ aa_map = {}
618
+ aa_comp = {}
619
+ sum = 0
620
+ each_aa do |aa|
621
+ if aa_map.key? aa
622
+ aa_map[aa] += 1
623
+ else
624
+ aa_map[aa] = 1
625
+ end
626
+ sum += 1
627
+ end
628
+ aa_map.each {|aa, count| aa_comp[aa] = (Rational(count,sum) * 100).to_f }
629
+ aa_comp
630
+ end
631
+ end
632
+
633
+ def each_aa
634
+ @seq.each_byte do |x|
635
+ yield x.chr.to_sym
636
+ end
637
+ end
638
+
639
+ def positive? residue
640
+ (residue == "H" || residue == "R" || residue == "K")
641
+ end
642
+
643
+ #
644
+ # Return proc calculating charge of a residue.
645
+ #
646
+ def charge_proc positive, pK, num
647
+ if positive
648
+ lambda {|ph|
649
+ num.to_f / (1.0 + 10.0 ** (ph - pK))
650
+ }
651
+ else
652
+ lambda {|ph|
653
+ (-1.0 * num.to_f) / (1.0 + 10.0 ** (pK - ph))
654
+ }
655
+ end
656
+ end
657
+
658
+ #
659
+ # Transform AA sequence into residue count
660
+ #
661
+ def residue_count
662
+ counted = []
663
+ # N-terminal
664
+ n_term = @seq[0].chr
665
+ if PK[:nterm].key? n_term.to_sym
666
+ counted << {
667
+ :num => 1,
668
+ :residue => n_term.to_sym,
669
+ :pK => PK[:nterm][n_term.to_sym],
670
+ :positive => positive?(n_term)
671
+ }
672
+ elsif PK[:normal].key? n_term.to_sym
673
+ counted << {
674
+ :num => 1,
675
+ :residue => n_term.to_sym,
676
+ :pK => PK[:normal][n_term.to_sym],
677
+ :positive => positive?(n_term)
678
+ }
679
+ end
680
+ # Internal
681
+ tmp_internal = {}
682
+ @seq[1,(@seq.length-2)].each_byte do |x|
683
+ aa = x.chr.to_sym
684
+ if PK[:internal].key? aa
685
+ if tmp_internal.key? aa
686
+ tmp_internal[aa][:num] += 1
687
+ else
688
+ tmp_internal[aa] = {
689
+ :num => 1,
690
+ :residue => aa,
691
+ :pK => PK[:internal][aa],
692
+ :positive => positive?(aa.to_s)
693
+ }
694
+ end
695
+ end
696
+ end
697
+ tmp_internal.each do |aa, val|
698
+ counted << val
699
+ end
700
+ # C-terminal
701
+ c_term = @seq[-1].chr
702
+ if PK[:cterm].key? c_term.to_sym
703
+ counted << {
704
+ :num => 1,
705
+ :residue => c_term.to_sym,
706
+ :pK => PK[:cterm][c_term.to_sym],
707
+ :positive => positive?(c_term)
708
+ }
709
+ end
710
+ counted
711
+ end
712
+
713
+ #
714
+ # Solving pI value with bisect algorithm.
715
+ #
716
+ def solve_pI charges
717
+ state = {
718
+ :ph => 0.0,
719
+ :charges => charges,
720
+ :pI => nil,
721
+ :ph_prev => 0.0,
722
+ :ph_next => 14.0,
723
+ :net_charge => 0.0
724
+ }
725
+ error = false
726
+ # epsilon means precision [pI = pH +_ E]
727
+ epsilon = 0.001
728
+
729
+ loop do
730
+ # Reset net charge
731
+ state[:net_charge] = 0.0
732
+ # Calculate net charge
733
+ state[:charges].each do |charge_proc|
734
+ state[:net_charge] += charge_proc.call state[:ph]
735
+ end
736
+
737
+ # Something is wrong - pH is higher than 14
738
+ if state[:ph] >= 14.0
739
+ error = true
740
+ break
741
+ end
742
+
743
+ # Making decision
744
+ temp_ph = 0.0
745
+ if state[:net_charge] <= 0.0
746
+ temp_ph = state[:ph]
747
+ state[:ph] = state[:ph] - ((state[:ph] - state[:ph_prev]) / 2.0)
748
+ state[:ph_next] = temp_ph
749
+ else
750
+ temp_ph = state[:ph]
751
+ state[:ph] = state[:ph] + ((state[:ph_next] - state[:ph]) / 2.0)
752
+ state[:ph_prev] = temp_ph
753
+ end
754
+
755
+ if (state[:ph] - state[:ph_prev] < epsilon) &&
756
+ (state[:ph_next] - state[:ph] < epsilon)
757
+ state[:pI] = state[:ph]
758
+ break
759
+ end
760
+ end
761
+
762
+ if !state[:pI].nil? && !error
763
+ state[:pI]
764
+ else
765
+ raise "Failed to Calc pI: pH is higher than 14"
766
+ end
767
+ end
768
+
769
+ def round(num, ndigits=0)
770
+ (num * (10 ** ndigits)).round().to_f / (10 ** ndigits).to_f
771
+ end
772
+
773
+ # --------------------------------
774
+ # :section: References
775
+ #
776
+ #
777
+ # 1. Protein Identification and Analysis Tools on the ExPASy Server;
778
+ # Gasteiger E., Hoogland C., Gattiker A., Duvaud S., Wilkins M.R.,
779
+ # Appel R.D., Bairoch A.; (In) John M. Walker (ed): The Proteomics
780
+ # Protocols Handbook, Humana Press (2005). pp. 571-607
781
+ # 2. Pace, C.N., Vajdos, F., Fee, L., Grimsley, G., and Gray, T. (1995)
782
+ # How to measure and predict the molar absorption coefficient of a
783
+ # protein. Protein Sci. 11, 2411-2423.
784
+ # 3. Edelhoch, H. (1967) Spectroscopic determination of tryptophan and
785
+ # tyrosine in proteins. Biochemistry 6, 1948-1954.
786
+ # 4. Gill, S.C. and von Hippel, P.H. (1989) Calculation of protein
787
+ # extinction coefficients from amino acid sequence data. Anal. Biochem.
788
+ # 182:319-326(1989).
789
+ # 5. Bachmair, A., Finley, D. and Varshavsky, A. (1986) In vivo half-life
790
+ # of a protein is a function of its amino-terminal residue. Science 234,
791
+ # 179-186.
792
+ # 6. Gonda, D.K., Bachmair, A., Wunning, I., Tobias, J.W., Lane, W.S. and
793
+ # Varshavsky, A. J. (1989) Universality and structure of the N-end rule.
794
+ # J. Biol. Chem. 264, 16700-16712.
795
+ # 7. Tobias, J.W., Shrader, T.E., Rocap, G. and Varshavsky, A. (1991) The
796
+ # N-end rule in bacteria. Science 254, 1374-1377.
797
+ # 8. Ciechanover, A. and Schwartz, A.L. (1989) How are substrates
798
+ # recognized by the ubiquitin-mediated proteolytic system? Trends Biochem.
799
+ # Sci. 14, 483-488.
800
+ # 9. Varshavsky, A. (1997) The N-end rule pathway of protein degradation.
801
+ # Genes Cells 2, 13-28.
802
+ # 10. Guruprasad, K., Reddy, B.V.B. and Pandit, M.W. (1990) Correlation
803
+ # between stability of a protein and its dipeptide composition: a novel
804
+ # approach for predicting in vivo stability of a protein from its primary
805
+ # sequence. Protein Eng. 4,155-161.
806
+ # 11. Ikai, A.J. (1980) Thermostability and aliphatic index of globular
807
+ # proteins. J. Biochem. 88, 1895-1898.
808
+ # 12. Kyte, J. and Doolittle, R.F. (1982) A simple method for displaying
809
+ # the hydropathic character of a protein. J. Mol. Biol. 157, 105-132.
810
+ # 13. Bjellqvist, B.,Hughes, G.J., Pasquali, Ch., Paquet, N., Ravier, F.,
811
+ # Sanchez, J.-Ch., Frutiger, S. & Hochstrasser, D.F. The focusing positions
812
+ # of polypeptides in immobilized pH gradients can be predicted from their
813
+ # amino acid sequences. Electrophoresis 1993, 14, 1023-1031.
814
+ #
815
+ # --------------------------------
816
+ end
817
+ end