ulla 0.9.9.1 → 0.9.9.2

Sign up to get free protection for your applications and to get access to all the features.
@@ -0,0 +1,7 @@
1
+ ---
2
+ SHA1:
3
+ metadata.gz: 40c1f0642b5169fc2e4e54c12c2e7b88e06a9f7f
4
+ data.tar.gz: 4ac3ec4031a177629ae72cf5de1803e3749bbf66
5
+ SHA512:
6
+ metadata.gz: 050a3eb74dc176396b8ef9d1f10bee85ce227e9694245776d2c0b2de92595ed919697fd79176e25c0ebeb203f1a5cc689f15f3ad6580609c8d0839b944627550
7
+ data.tar.gz: d05f5de53ed2ac6552af88153c932f9947d47db5dfdcbe9ef09547f54a924ff8eff1fa8cee2dbde4101f40064e5f0299208977cfff80ce239b930d9b3bcc33a4
@@ -1,7 +1,7 @@
1
1
  == 0.9.9.1 26/08/2009
2
2
 
3
3
  * Removed dependency on the Ruby Facets library
4
- * Total -> total for compativility with melody
4
+ * Total -> total for compatibility with melody
5
5
 
6
6
  == 0.9.9 09/08/2009
7
7
 
@@ -18,6 +18,10 @@ lib/ulla/environment_class_hash.rb
18
18
  lib/ulla/environment_feature.rb
19
19
  lib/ulla/environment_feature_array.rb
20
20
  lib/ulla/heatmap_array.rb
21
+ lib/ulla/esst.rb
22
+ lib/ulla/essts.rb
23
+ lib/ulla/joy_tem.rb
24
+ lib/ulla/sequence.rb
21
25
  script/console
22
26
  script/destroy
23
27
  script/generate
@@ -1,5 +1,5 @@
1
1
  HELP/OPTIONS: ulla -h
2
2
 
3
- For more information on ulla, see http://ulla.rubyforge.org
3
+ For more information on ulla, see http://github.com/semin/ulla.
4
4
 
5
5
 
@@ -25,7 +25,6 @@ Following RubyGems will be automatically installed if you have rubygems installe
25
25
 
26
26
  * narray (http://narray.rubyforge.org)
27
27
  * bio (http://bioruby.open-bio.org)
28
- * Active Support (http://as.rubyonrails.org)
29
28
  * RMagick (http://rmagick.rubyforge.org)
30
29
 
31
30
 
@@ -36,7 +35,7 @@ Following RubyGems will be automatically installed if you have rubygems installe
36
35
 
37
36
  == Basic Usage
38
37
 
39
- It's pretty much the same as Kenji's subst (http://www-cryst.bioc.cam.ac.uk/~kenji/subst/), so in most cases, you can swap 'subst' with 'ulla'.
38
+ It's pretty much the same as Kenji's subst (http://mordred.bioc.cam.ac.uk/~kenji/subst/), so in most cases, you can swap 'subst' with 'ulla'.
40
39
 
41
40
  ~user $ ulla -l TEMLIST-file -c classdef.dat
42
41
  or
@@ -98,7 +97,7 @@ It's pretty much the same as Kenji's subst (http://www-cryst.bioc.cam.ac.uk/~ken
98
97
 
99
98
  == Usage
100
99
 
101
- 1. Prepare an environmental class definition file. For more details, please check this notes (http://www-cryst.bioc.cam.ac.uk/~kenji/subst/NOTES). You can download a sample environmental class definition file from http://www-cryst.bioc.cam.ac.uk/~kenji/subst/classdef.dat
100
+ 1. Prepare an environmental class definition file. For more details, please check this notes (http://mordred.bioc.cam.ac.uk/~kenji/subst/NOTES). You can download a sample environmental class definition file from http://mordred.bioc.cam.ac.uk/~kenji/subst/classdef.dat
102
101
 
103
102
  ~user $ cat classdef.dat
104
103
  #
@@ -108,7 +107,7 @@ It's pretty much the same as Kenji's subst (http://www-cryst.bioc.cam.ac.uk/~ken
108
107
  secondary structure and phi angle;HEPC;HEPC;T;F
109
108
  solvent accessibility;TF;Aa;F;F
110
109
 
111
- 2. Prepare structural alignments and their annotations of above environmental classes in PIR format. You can download sample alignments from http://www-cryst.bioc.cam.ac.uk/~kenji/subst/alltem-allmask.tar.gz
110
+ 2. Prepare structural alignments and their annotations of above environmental classes in PIR format. You can download sample alignments from http://mordred.bioc.cam.ac.uk/~kenji/subst/alltem-allmask.tar.gz or from http://www-cryst.bioc.cam.ac.uk/ESST/
112
111
 
113
112
  ~user $ cat sample1.tem
114
113
  >P1;1mnma
@@ -167,7 +166,7 @@ It's pretty much the same as Kenji's subst (http://www-cryst.bioc.cam.ac.uk/~ken
167
166
 
168
167
  9. In case positions are masked with the character 'X' in any environmental features, all mutations from/to the position will be excluded from substitution counts.
169
168
 
170
- 10. Then, it will produce a file containing all the matrices, which will look like the one below. For more details, please check this notes (http://www-cryst.bioc.cam.ac.uk/~kenji/subst/NOTES).
169
+ 10. Then, it will produce a file containing all the matrices, which will look like the one below. For more details, please check this notes (http://mordred.bioc.cam.ac.uk/~kenji/subst/NOTES).
171
170
 
172
171
  # Environment-specific amino acid substitution matrices
173
172
  # Creator: ulla version 0.0.5
@@ -226,7 +225,7 @@ It's pretty much the same as Kenji's subst (http://www-cryst.bioc.cam.ac.uk/~ken
226
225
 
227
226
  which will look like this,
228
227
 
229
- http://www-cryst.bioc.cam.ac.uk/~semin/images/0.HA.png
228
+ http://mordred.bioc.cam.ac.uk/~semin/images/0.HA.png
230
229
 
231
230
  12. To generate one big figure, 'myheatmaps.gif' containing all the heat maps (4 maps in a row),
232
231
 
@@ -234,7 +233,7 @@ It's pretty much the same as Kenji's subst (http://www-cryst.bioc.cam.ac.uk/~ken
234
233
 
235
234
  which will look like this,
236
235
 
237
- http://www-cryst.bioc.cam.ac.uk/~semin/images/myheatmaps.gif
236
+ http://mordred.bioc.cam.ac.uk/~semin/images/myheatmaps.gif
238
237
 
239
238
  == Repository
240
239
 
data/Rakefile CHANGED
@@ -2,6 +2,7 @@ require 'rubygems'
2
2
  gem 'hoe', '>= 2.1.0'
3
3
  require 'hoe'
4
4
  require 'fileutils'
5
+ require './lib/ulla.rb'
5
6
 
6
7
  Hoe.plugin :newgem
7
8
  # Hoe.plugin :website
@@ -12,7 +13,6 @@ Hoe.plugin :newgem
12
13
  $hoe = Hoe.spec 'ulla' do
13
14
  self.developer 'Semin Lee', 'seminlee@gmail.com'
14
15
  self.post_install_message = 'PostInstall.txt' # TODO remove if post-install message not required
15
- self.rubyforge_name = self.name # TODO this is default value
16
16
  self.extra_deps = [
17
17
  ['narray', '>= 0.5.9.5'],
18
18
  ['bio', '>= 1.2.1'],
@@ -21,4 +21,4 @@ $hoe = Hoe.spec 'ulla' do
21
21
  end
22
22
 
23
23
  require 'newgem/tasks'
24
- Dir['tasks/**/*.rake'].each { |t| load t }
24
+ Dir['tasks/*.rake'].each { |t| load t }
@@ -1,6 +1,3 @@
1
- require 'rubygems'
2
- require 'narray'
3
-
4
1
  module NArrayExtensions
5
2
 
6
3
  def pretty_string(options={})
@@ -1,14 +1,3 @@
1
- require 'rubygems'
2
- require 'narray'
3
-
4
- begin
5
- require 'rvg/rvg'
6
- include Magick
7
- rescue Exception => e
8
- $logger.warn "#{e.to_s.chomp} For this reason, heat maps cannot be generated."
9
- $no_rmagick = true
10
- end
11
-
12
1
  module NMatrixExtensions
13
2
 
14
3
  def pretty_string(options={})
@@ -1,6 +1,50 @@
1
1
  $:.unshift(File.dirname(__FILE__)) unless
2
2
  $:.include?(File.dirname(__FILE__)) || $:.include?(File.expand_path(File.dirname(__FILE__)))
3
3
 
4
+ require 'bio'
5
+ require 'set'
6
+ require 'logger'
7
+ require 'narray'
8
+ require 'rubygems'
9
+ require 'bio'
10
+ require 'set'
11
+ require 'inline'
12
+ require 'narray'
13
+ require 'logger'
14
+ require 'narray'
15
+ require 'stringio'
16
+ require 'pathname'
17
+ require 'getoptlong'
18
+ require 'fork_manager'
19
+ require 'facets/enumerable'
20
+
21
+ begin
22
+ require 'rvg/rvg'
23
+ include Magick
24
+ rescue Exception => e
25
+ $logger.warn "#{e.to_s.chomp} For this reason, heat maps cannot be generated."
26
+ $no_rmagick = true
27
+ end
28
+
29
+ require_relative 'math_extensions'
30
+ require_relative 'array_extensions'
31
+ require_relative 'string_extensions'
32
+ require_relative 'narray_extensions'
33
+ require_relative 'nmatrix_extensions'
34
+
35
+ require_relative 'ulla/esst'
36
+ require_relative 'ulla/essts'
37
+ require_relative 'ulla/joy_tem'
38
+ require_relative 'ulla/sequence'
39
+ require_relative 'ulla/heatmap_array'
40
+ require_relative 'ulla/environment'
41
+ require_relative 'ulla/environment_class_hash'
42
+ require_relative 'ulla/environment_feature'
43
+ require_relative 'ulla/environment_feature_array'
44
+
4
45
  module Ulla
5
- VERSION = '0.9.9.1'
46
+ VERSION = '0.9.9.2'
47
+
48
+ $logger = Logger.new(STDOUT)
49
+ $logger.level = Logger::WARN
6
50
  end
@@ -1,15 +1,86 @@
1
- require 'rubygems'
2
- require 'getoptlong'
3
- require 'logger'
4
- require 'narray'
5
- require 'bio'
6
- require 'set'
7
-
8
1
  # This is a module for an actual command line interpreter for Ulla
9
2
  # ---
10
3
  # Copyright (C) 2008-9 Semin Lee
11
4
  module Ulla
12
5
  class CLI
6
+
7
+ # Calculate PID between two sequences
8
+ #
9
+ # :call-seq:
10
+ # Ulla::CLI::calculate_pid(seq1, seq2, unit) -> Float
11
+ #
12
+ def self.calculate_pid_rb(seq1, seq2, unit)
13
+ aas1 = seq1.scan(/\S{#{unit}}/)
14
+ aas2 = seq2.scan(/\S{#{unit}}/)
15
+ gap = ($gap || '-') * unit
16
+ align = 0 # no. of aligned columns
17
+ ident = 0 # no. of identical columns
18
+ intgp = 0 # no. of internal gaps
19
+
20
+ if (aas1.size != aas2.size)
21
+ $logger.error "Cannot calculate PID between unaligned sequences"
22
+ $logger.error seq1, seq2
23
+ exit 1
24
+ end
25
+
26
+ (0...aas1.size).each do |i|
27
+ if (aas1[i] != gap) && (aas2[i] != gap)
28
+ align += 1
29
+ if aas1[i] == aas2[i]
30
+ ident += 1
31
+ end
32
+ elsif (((aas1[i] == gap) && (aas2[i] != gap)) ||
33
+ ((aas1[i] != gap) && (aas2[i] == gap)))
34
+ intgp += 1
35
+ end
36
+ end
37
+
38
+ 100.0 * ident / (align + intgp)
39
+ end
40
+
41
+
42
+ inline(:C) do |builder|
43
+ builder.add_compile_flags '-x c++', '-lstdc++'
44
+ builder.c_singleton %q{
45
+ static VALUE calculate_pid_cpp(VALUE seq1, VALUE seq2, VALUE unit) {
46
+ VALUE re = rb_str_plus(rb_str_plus(rb_str_new2("\\\\S{"), rb_funcall(unit, rb_intern("to_s"), 0)), rb_str_new2("}"));
47
+ VALUE aas1 = rb_funcall(seq1, rb_intern("scan"), 1, rb_reg_new_str(re, 0));
48
+ VALUE aas2 = rb_funcall(seq2, rb_intern("scan"), 1, rb_reg_new_str(re, 0));
49
+ //VALUE aas1 = rb_funcall(seq1, rb_intern("split"), 1, rb_str_new2(""));
50
+ //VALUE aas2 = rb_funcall(seq2, rb_intern("split"), 1, rb_str_new2(""));
51
+ VALUE *aas1_p = RARRAY_PTR(aas1);
52
+ VALUE *aas2_p = RARRAY_PTR(aas2);
53
+ VALUE gap = rb_str_new2("-");
54
+ long len1 = RARRAY_LEN(aas1);
55
+ //long len2 = RARRAY_LEN(aas2);
56
+ double align = 0.0;
57
+ double ident = 0.0;
58
+ double intgp = 0.0;
59
+
60
+ for (long i = 0; i < len1; i++) {
61
+ if ((rb_str_equal(aas1_p[i], gap) == Qfalse) && (rb_str_equal(aas2_p[i], gap) == Qfalse)) {
62
+ align += 1.0;
63
+ if (rb_str_equal(aas1_p[i], aas2_p[i]) == Qtrue) {
64
+ ident += 1.0;
65
+ }
66
+ } else if (((rb_str_equal(aas1_p[i], gap) == Qtrue) && (rb_str_equal(aas2_p[i], gap) == Qfalse)) ||
67
+ ((rb_str_equal(aas1_p[i], gap) == Qfalse) && (rb_str_equal(aas2_p[i], gap) == Qtrue))) {
68
+ intgp += 1.0;
69
+ }
70
+ }
71
+ return DBL2NUM(100.0 * ident / (align + intgp));
72
+ }
73
+ }
74
+ end
75
+
76
+ def self.calculate_pid(seq1, seq2, unit)
77
+ begin
78
+ self.calculate_pid_cpp(seq1, seq2, unit)
79
+ rescue
80
+ self.calculate_pid_rb(seq1, seq2, unit)
81
+ end
82
+ end
83
+
13
84
  class << self
14
85
 
15
86
  # :nodoc:
@@ -38,7 +109,7 @@ Options:
38
109
  --tem-file (-f) FILE: a tem file
39
110
  --tem-list (-l) FILE: a list for tem files
40
111
  --classdef (-c) FILE: a file for the defintion of environmental class
41
- if no definition file provided, --cys (-y) 2 and --nosmooth options automatcially applied
112
+ if no definition file provided, --cys (-y) 2 and --nosmooth options applied
42
113
  --outfile (-o) FILE: output filename (default 'allmat.dat')
43
114
  --weight (-w) INTEGER: clustering level (PID) for the BLOSUM-like weighting (default: 60)
44
115
  --noweight: calculate substitution counts with no weights
@@ -58,13 +129,13 @@ Options:
58
129
  0 for raw counts (no smoothing performed)
59
130
  1 for probabilities
60
131
  2 for log-odds (default)
61
- --noroundoff: do not round off log odds ratio
62
- --scale INTEGER: log-odds matrices in 1/n bit units (default 3)
63
- --sigma DOUBLE: change the sigma value for smoothing (default 5.0)
132
+ --noroundoff: do not round off log-odds ratio
133
+ --scale INTEGER: log-odds matrices in 1/n bit units (default: 3)
134
+ --sigma DOUBLE: change the sigma value for smoothing (default: 5.0)
64
135
  --autosigma: automatically adjust the sigma value for smoothing
65
- --add DOUBLE: add this value to raw counts when deriving log-odds without smoothing (default 0)
66
- --pidmin DOUBLE: count substitutions only for pairs with PID equal to or greater than this value (default none)
67
- --pidmax DOUBLE: count substitutions only for pairs with PID smaller than this value (default none)
136
+ --add DOUBLE: add this value to raw counts when deriving log-odds without smoothing (default: 0)
137
+ --pidmin DOUBLE: count substitutions only for pairs with PID equal to or greater than this value
138
+ --pidmax DOUBLE: count substitutions only for pairs with PID smaller than this value
68
139
  --heatmap INTEGER:
69
140
  0 create a heat map file for each substitution table
70
141
  1 create one big file containing all heat maps from substitution tables
@@ -91,35 +162,6 @@ Options:
91
162
  puts (verbose ? usage + options : usage)
92
163
  end
93
164
 
94
- # Calculate PID between two sequences
95
- #
96
- # :call-seq:
97
- # Ulla::CLI::calculate_pid(seq1, seq2) -> Float
98
- #
99
- def calculate_pid(seq1, seq2, unit)
100
- aas1 = seq1.scan(/\w{#{unit}}/)
101
- aas2 = seq2.scan(/\w{#{unit}}/)
102
- cols = aas1.zip(aas2)
103
- gap = ($gap || '-') * unit
104
- align = 0 # no. of aligned columns
105
- ident = 0 # no. of identical columns
106
- intgp = 0 # no. of internal gaps
107
-
108
- cols.each do |col|
109
- if (col[0] != gap) && (col[1] != gap)
110
- align += 1
111
- if col[0] == col[1]
112
- ident += 1
113
- end
114
- elsif (((col[0] == gap) && (col[1] != gap)) ||
115
- ((col[0] != gap) && (col[1] == gap)))
116
- intgp += 1
117
- end
118
- end
119
-
120
- pid = 100.0 * ident.to_f / (align + intgp)
121
- end
122
-
123
165
  # :nodoc:
124
166
  def execute(arguments=[])
125
167
  #
@@ -152,9 +194,6 @@ Options:
152
194
  # Global variables and their default values
153
195
  #
154
196
 
155
- $logger = Logger.new(STDOUT)
156
- $logger.level = Logger::WARN
157
-
158
197
  # default set of 21 amino acids including J (Cysteine, the free thiol form)
159
198
  $amino_acids = 'ACDEFGHIKLMNPQRSTVWYJ'.split('')
160
199
  $gap = '-'
@@ -179,7 +218,6 @@ Options:
179
218
  $scale = 3
180
219
  $pidmin = nil
181
220
  $pidmax = nil
182
- $scale = 3
183
221
  $add = nil
184
222
  $cys = 0
185
223
  $targetenv = false
@@ -233,6 +271,9 @@ Options:
233
271
  [ '--noroundoff', GetoptLong::NO_ARGUMENT ],
234
272
  [ '--sigma', GetoptLong::REQUIRED_ARGUMENT ],
235
273
  [ '--autosigma', GetoptLong::NO_ARGUMENT ],
274
+ [ '--scale', GetoptLong::REQUIRED_ARGUMENT ],
275
+ [ '--pidmax', GetoptLong::REQUIRED_ARGUMENT ],
276
+ [ '--pidmin', GetoptLong::REQUIRED_ARGUMENT ],
236
277
  [ '--add', GetoptLong::REQUIRED_ARGUMENT ],
237
278
  [ '--heatmap', GetoptLong::REQUIRED_ARGUMENT ],
238
279
  [ '--heatmap-stem', GetoptLong::REQUIRED_ARGUMENT ],
@@ -297,7 +338,7 @@ Options:
297
338
  when '--penv'
298
339
  warn "--penv option is not supported."
299
340
  exit 1
300
- $penv = true
341
+ #$penv = true
301
342
  when '--heatmap'
302
343
  $heatmap = case arg.to_i
303
344
  when (0..2) then arg.to_i
@@ -365,19 +406,6 @@ Options:
365
406
  warn "Cannot find environment class definition file, #{$classdef}"
366
407
  exit 1
367
408
  end
368
-
369
- require 'math_extensions'
370
- require 'array_extensions'
371
- require 'string_extensions'
372
- require 'narray_extensions'
373
- require 'nmatrix_extensions'
374
-
375
- require 'ulla/environment'
376
- require 'ulla/environment_class_hash'
377
- require 'ulla/environment_feature'
378
- require 'ulla/environment_feature_array'
379
- require 'ulla/heatmap_array'
380
-
381
409
  #
382
410
  # Part 2 END
383
411
  #
@@ -425,15 +453,18 @@ Options:
425
453
  next
426
454
  elsif (env_ftr = line.split(/;/)).length == 5
427
455
  $logger.info "An environment feature, #{line} detected."
456
+
428
457
  if env_ftr[-1] == 'T'
429
458
  # skip silenced environment feature
430
459
  $logger.warn "The environment feature, #{line} silent."
431
460
  next
432
461
  end
462
+
433
463
  if env_ftr[-2] == 'T'
434
464
  $cst_features << env_index
435
465
  $logger.warn "The environment feature, #{line} constrained."
436
466
  end
467
+
437
468
  $env_features << EnvironmentFeature.new(env_ftr[0],
438
469
  env_ftr[1].split(''),
439
470
  env_ftr[2].split(''),
@@ -571,7 +602,7 @@ Options:
571
602
  seq2 = seq2.split('').each_with_index.map { |aa, pos| aa == $gap ? $ext_gap : env_labels[id2][pos] }.join
572
603
  end
573
604
 
574
- pid = calculate_pid(seq1, seq2, $col_size)
605
+ pid = calculate_pid_cpp(seq1, seq2, $col_size)
575
606
  s1 = seq1.scan(/\S{#{$col_size}}/)
576
607
  s2 = seq2.scan(/\S{#{$col_size}}/)
577
608
 
@@ -610,8 +641,10 @@ Options:
610
641
  next
611
642
  end
612
643
 
613
- aa1 = (disulphide.has_key?(id1) && (disulphide[id1][pos] == 'F') && (aa1[0].chr == 'C') && ($cys != 2)) ? 'J' + aa1[1..-1] : aa1
614
- aa2 = (disulphide.has_key?(id2) && (disulphide[id2][pos] == 'F') && (aa2[0].chr == 'C') && ($cys != 2)) ? 'J' + aa2[1..-1] : aa2
644
+ #aa1 = (disulphide.has_key?(id1) && (disulphide[id1][pos] == 'F') && (aa1[0].chr == 'C') && ($cys != 2)) ? 'J' + aa1[1..-1] : aa1
645
+ #aa2 = (disulphide.has_key?(id2) && (disulphide[id2][pos] == 'F') && (aa2[0].chr == 'C') && ($cys != 2)) ? 'J' + aa2[1..-1] : aa2
646
+ aa1 = (aa1[0].chr == 'C' && (!disulphide.has_key?(id1) || disulphide[id1][pos] == 'F') && $cys != 2) ? 'J' + aa1[1..-1] : aa1
647
+ aa2 = (aa2[0].chr == 'C' && (!disulphide.has_key?(id2) || disulphide[id2][pos] == 'F') && $cys != 2) ? 'J' + aa2[1..-1] : aa2
615
648
  env_label = $environment == 1 ? aa1 + '-' + aa2[1..-1] : env_labels[id1][pos]
616
649
 
617
650
  if $cst_features.empty?
@@ -648,7 +681,7 @@ Options:
648
681
  ali = ext_ali
649
682
  end
650
683
 
651
- # a loop for single linkage clustering
684
+ # loop for single linkage clustering
652
685
  begin
653
686
  continue = false
654
687
  0.upto(clusters.size - 2) do |i|
@@ -657,7 +690,7 @@ Options:
657
690
  found = false
658
691
  clusters[i].each do |c1|
659
692
  clusters[j].each do |c2|
660
- if calculate_pid(ali[c1], ali[c2], $col_size) >= $weight
693
+ if calculate_pid_cpp(ali[c1], ali[c2], $col_size) >= $weight
661
694
  indexes << j
662
695
  found = true
663
696
  break
@@ -694,12 +727,12 @@ Options:
694
727
  seq1.each_with_index do |aa1, pos|
695
728
  aa2 = seq2[pos]
696
729
 
697
- if env_labels[id1][pos].include?('X')
730
+ if env_labels.has_key?(id1) && env_labels[id1][pos].include?('X')
698
731
  $logger.debug "All substitutions from #{id1}-#{pos}-#{aa1[0].chr} are masked."
699
732
  next
700
733
  end
701
734
 
702
- if env_labels[id2][pos].include?('X')
735
+ if env_labels.has_key?(id2) && env_labels[id2][pos].include?('X')
703
736
  $logger.debug "All substitutions to #{id2}-#{pos}-#{aa2[0].chr} are masked."
704
737
  next
705
738
  end
@@ -714,18 +747,21 @@ Options:
714
747
  next
715
748
  end
716
749
 
717
- aa1 = (disulphide.has_key?(id1) && (disulphide[id1][pos] == 'F') && (aa1[0].chr == 'C') && ($cys != 2)) ? 'J' + aa1[1..-1] : aa1
718
- aa2 = (disulphide.has_key?(id2) && (disulphide[id2][pos] == 'F') && (aa2[0].chr == 'C') && ($cys != 2)) ? 'J' + aa2[1..-1] : aa2
719
- cnt1 = 1.0 / cluster1.size.to_f
720
- cnt2 = 1.0 / cluster2.size.to_f
750
+ #aa1 = (disulphide.has_key?(id1) && (disulphide[id1][pos] == 'F') && (aa1[0].chr == 'C') && ($cys != 2)) ? 'J' + aa1[1..-1] : aa1
751
+ #aa2 = (disulphide.has_key?(id2) && (disulphide[id2][pos] == 'F') && (aa2[0].chr == 'C') && ($cys != 2)) ? 'J' + aa2[1..-1] : aa2
752
+ #aa1 = (aa1[0].chr == 'C' && (!disulphide.has_key?(id1) || disulphide[id1][pos] == 'F') && $cys != 2) ? 'J' + aa1[1..-1] : aa1
753
+ #aa2 = (aa2[0].chr == 'C' && (!disulphide.has_key?(id2) || disulphide[id2][pos] == 'F') && $cys != 2) ? 'J' + aa2[1..-1] : aa2
754
+ cnt1 = 1.0 / cluster1.size
755
+ cnt2 = 1.0 / cluster2.size
721
756
  jnt_cnt = cnt1 * cnt2
722
757
  env_label1 = $environment == 1 ? aa1 + '-' + aa2[1..-1] : env_labels[id1][pos]
723
758
  env_label2 = $environment == 1 ? aa2 + '-' + aa1[1..-1] : env_labels[id2][pos]
724
759
 
725
760
  if $cst_features.empty?
726
- $env_classes[env_label1].increase_residue_count(aa2[0].chr, jnt_cnt) #rescue $logger.error "Something wrong with #{tem_file}-#{id2}-#{pos}-#{aa2}-#{env_label2}"
727
- $env_classes[env_label2].increase_residue_count(aa1[0].chr, jnt_cnt) #rescue $logger.error "Something wrong with #{tem_file}-#{id2}-#{pos}-#{aa2}-#{env_label2}"
728
- elsif (env_labels[id1][pos].split('').values_at(*$cst_features) == env_labels[id2][pos].split('').values_at(*$cst_features))
761
+ $env_classes[env_label1].increase_residue_count(aa2[0].chr, jnt_cnt)
762
+ $env_classes[env_label2].increase_residue_count(aa1[0].chr, jnt_cnt)
763
+ elsif (env_labels[id1][pos].split('').values_at(*$cst_features) ==
764
+ env_labels[id2][pos].split('').values_at(*$cst_features))
729
765
  $env_classes[env_label1].increase_residue_count(aa2[0].chr, jnt_cnt)
730
766
  $env_classes[env_label2].increase_residue_count(aa1[0].chr, jnt_cnt)
731
767
  else
@@ -735,8 +771,71 @@ Options:
735
771
 
736
772
  $aa_tot_cnt.has_key?(aa1) ? $aa_tot_cnt[aa1] += cnt1 : $aa_tot_cnt[aa1] = cnt1
737
773
  $aa_tot_cnt.has_key?(aa2) ? $aa_tot_cnt[aa2] += cnt2 : $aa_tot_cnt[aa2] = cnt2
738
- $aa_mut_cnt.has_key?(aa1) ? $aa_mut_cnt[aa1] += cnt1 : $aa_mut_cnt[aa1] = cnt1 if aa1 == aa2
739
- $aa_mut_cnt.has_key?(aa2) ? $aa_mut_cnt[aa2] += cnt2 : $aa_mut_cnt[aa2] = cnt2 if aa1 == aa2
774
+ ($aa_mut_cnt.has_key?(aa1) ? $aa_mut_cnt[aa1] += cnt1 : $aa_mut_cnt[aa1] = cnt1) if aa1 != aa2
775
+ ($aa_mut_cnt.has_key?(aa2) ? $aa_mut_cnt[aa2] += cnt2 : $aa_mut_cnt[aa2] = cnt2) if aa1 != aa2
776
+
777
+ #if $cst_features.empty?
778
+ #if $env_classes.has_key?(env_label1)
779
+ #if $env_classes.has_key?(env_label2)
780
+ #$env_classes[env_label1].increase_residue_count(aa2[0].chr, jnt_cnt)
781
+ #else
782
+ #if (aa1 == 'C' && aa2 == 'J')
783
+ #$env_classes[env_label1].increase_residue_count('C', jnt_cnt)
784
+ #else
785
+ #$env_classes[env_label1].increase_residue_count(aa2, jnt_cnt)
786
+ #end
787
+ #end
788
+ #end
789
+ #if $env_classes.has_key?(env_label2)
790
+ #if $env_classes.has_key?(env_label1)
791
+ #$env_classes[env_label2].increase_residue_count(aa1[0].chr, jnt_cnt)
792
+ #else
793
+ #if (aa2 == 'C' && aa1 == 'J')
794
+ #$env_classes[env_label2].increase_residue_count('C', jnt_cnt)
795
+ #else
796
+ #$env_classes[env_label2].increase_residue_count(aa1, jnt_cnt)
797
+ #end
798
+ #end
799
+ #end
800
+ #elsif (env_labels[id1][pos].split('').values_at(*$cst_features) == env_labels[id2][pos].split('').values_at(*$cst_features))
801
+ #$env_classes[env_label1].increase_residue_count(aa2[0].chr, jnt_cnt)
802
+ #$env_classes[env_label2].increase_residue_count(aa1[0].chr, jnt_cnt)
803
+ #else
804
+ #$logger.debug "Skipped #{id1}-#{pos}-#{aa1[0].chr} and #{id2}-#{pos}-#{aa2[0].chr} having different symbols for constrained environment features each other."
805
+ #next
806
+ #end
807
+
808
+ #if $env_classes.has_key?(env_label1)
809
+ #$aa_tot_cnt.has_key?(aa1) ? $aa_tot_cnt[aa1] += cnt1 : $aa_tot_cnt[aa1] = cnt1
810
+
811
+ #if $env_classes.has_key?(env_label2)
812
+ #if aa1[0].chr != aa2[0].chr
813
+ #$aa_mut_cnt.has_key?(aa1) ? $aa_mut_cnt[aa1] += cnt1 : $aa_mut_cnt[aa1] = cnt1
814
+ #end
815
+ #else
816
+ #if (aa1[0].chr != aa2)
817
+ #unless (aa1[0].chr == 'C' && aa2 == 'J') || (aa1[0].chr == 'J' && aa2 == 'C')
818
+ #$aa_mut_cnt.has_key?(aa1) ? $aa_mut_cnt[aa1] += cnt1 : $aa_mut_cnt[aa1] = cnt1
819
+ #end
820
+ #end
821
+ #end
822
+ #end
823
+
824
+ #if $env_classes.has_key?(env_label2)
825
+ #$aa_tot_cnt.has_key?(aa2) ? $aa_tot_cnt[aa2] += cnt2 : $aa_tot_cnt[aa2] = cnt2
826
+
827
+ #if $env_classes.has_key?(env_label1)
828
+ #if aa1[0].chr != aa2[0].chr
829
+ #$aa_mut_cnt.has_key?(aa2) ? $aa_mut_cnt[aa2] += cnt2 : $aa_mut_cnt[aa2] = cnt2
830
+ #end
831
+ #else
832
+ #if (aa1 != aa2[0].chr)
833
+ #unless (aa1 == 'J' && aa2[0].chr == 'C') || (aa1 == 'C' && aa2[0].chr == 'J')
834
+ #$aa_mut_cnt.has_key?(aa2) ? $aa_mut_cnt[aa2] += cnt2 : $aa_mut_cnt[aa2] = cnt2
835
+ #end
836
+ #end
837
+ #end
838
+ #end
740
839
 
741
840
  $logger.debug "#{id1}-#{pos}-#{aa1[0].chr} -> #{id2}-#{pos}-#{aa2[0].chr} substitution count (#{"%.2f" % jnt_cnt}) was added to the environments class, #{env_label1}."
742
841
  $logger.debug "#{id2}-#{pos}-#{aa2[0].chr} -> #{id1}-#{pos}-#{aa1[0].chr} substitution count (#{"%.2f" % jnt_cnt}) was added to the environments class, #{env_label2}."
@@ -748,6 +847,13 @@ Options:
748
847
  $logger.info "Analysing #{tem_file} done."
749
848
  end
750
849
 
850
+ $tot_aa = $aa_tot_cnt.values.sum
851
+
852
+ if $tot_aa < 1
853
+ $logger.warn "No amino acid substitution counted!"
854
+ exit 1
855
+ end
856
+
751
857
  # print out default header
752
858
  $outfh.puts <<HEADER
753
859
  # Environment-specific amino acid substitution matrices
@@ -798,18 +904,15 @@ HEADER
798
904
 
799
905
  # calculate amino acid frequencies and mutabilities, and
800
906
  # print them as default statistics in the header part
907
+
908
+ # pre-calculate ALA's mutability
801
909
  if $environment == 0
802
- ala_factor = if $aa_tot_cnt['A'] == 0
803
- 0.0
804
- elsif $aa_mut_cnt['A'] == 0
805
- 0.0
806
- else
807
- 100.0 * $aa_tot_cnt['A'] / $aa_mut_cnt['A'].to_f
808
- end
910
+ ala_mutb = if $aa_tot_cnt['A'] == 0 then 0.0
911
+ elsif $aa_mut_cnt['A'] == 0 then 0.0
912
+ else $aa_mut_cnt['A'].to_f / $aa_tot_cnt['A']
913
+ end
809
914
  end
810
915
 
811
- $tot_aa = $aa_tot_cnt.values.sum
812
-
813
916
  $outfh.puts '#'
814
917
  $outfh.puts "# Total amino acid frequencies:\n"
815
918
 
@@ -843,8 +946,8 @@ HEADER
843
946
  end
844
947
 
845
948
  if $environment == 0
846
- $aa_mutb[aa] = ($aa_tot_cnt[aa] == 0) ? 1.0 : ($aa_mut_cnt[aa] / $aa_tot_cnt[aa].to_f)
847
- $aa_rel_mutb[aa] = $aa_mutb[aa] * ala_factor
949
+ $aa_mutb[aa] = ($aa_tot_cnt[aa] == 0) ? 0.0 : ($aa_mut_cnt[aa] / $aa_tot_cnt[aa].to_f)
950
+ $aa_rel_mutb[aa] = 100 * $aa_mutb[aa] / ala_mutb
848
951
  end
849
952
 
850
953
  $aa_tot_freq[aa] = ($aa_tot_cnt[aa] == 0) ? 0.0 : ($aa_tot_cnt[aa] / $tot_aa.to_f)
@@ -866,7 +969,7 @@ HEADER
866
969
 
867
970
  if $noweight
868
971
  if $environment == 0
869
- $outfh.puts '# %-3s %9d %9d %5.2f %8d %8.4f' % columns
972
+ $outfh.puts '# %-3s %9d %9d %5.2f %8d %8.6f' % columns
870
973
  else
871
974
  $outfh.puts "# %-3s %-#{$env_features.size}s %9d %9d %8.4f" % columns
872
975
  end