semin-egor 0.9.1 → 0.9.2

Sign up to get free protection for your applications and to get access to all the features.
data/README.markdown ADDED
@@ -0,0 +1,243 @@
1
+ # egor
2
+
3
+ [http://www-cryst.bioc.cam.ac.uk/egor](http://www-cryst.bioc.cam.ac.uk/egor "Egor Homepage")
4
+
5
+
6
+ ## Description
7
+
8
+ 'egor' is a program for calculating environment-specific substitution tables from user providing environmental class definitions and sequence alignments with the annotations of the environment classes.
9
+
10
+
11
+ ## Features
12
+
13
+ * Environment-specific substitution table generation based on user providing environmental class definition
14
+ * Entropy-based smoothing procedures to cope with sparse data problem
15
+ * BLOSUM-like weighting procedures using PID threshold
16
+ * Heat Map generation for substitution tables
17
+
18
+
19
+ ## Requirements
20
+
21
+ * ruby 1.8.7 or above ([http://www.ruby-lang.org](http://www.ruby-lang.org "Ruby"))
22
+ * rubygems 1.2.0 or above ([http://rubyforge.org/projects/rubygems/](http://rubyforge.org/projects/rubygems "RubyGems"))
23
+
24
+ Following RubyGems will be automatically installed if you have rubygems installed on your machine
25
+
26
+ * narray ([http://narray.rubyforge.org](http://narray.rubyforge.org "NArray"))
27
+ * facets ([http://facets.rubyforge.org](http://facets.rubyforge.org "Ruby Facets"))
28
+ * bio ([http://bioruby.open-bio.org](http://bioruby.open-bio.org "BioRuby"))
29
+ * RMagick ([http://rmagick.rubyforge.org/](http://rmagick.rubyforge.org/ "RMagick"))
30
+
31
+
32
+ ## Installation
33
+
34
+ ~user $ sudo gem install egor
35
+
36
+
37
+ ## Basic Usage
38
+
39
+ It's pretty much the same as Kenji's subst (http://www-cryst.bioc.cam.ac.uk/~kenji/subst/), so in most cases, you can swap 'subst' with 'egor'.
40
+
41
+ ~user $ egor -l TEMLIST-file -c classdef.dat
42
+ or
43
+ ~user $ egor -f TEM-file -c classdef.dat
44
+
45
+
46
+ ## Options
47
+ --tem-file (-f) FILE: a tem file
48
+ --tem-list (-l) FILE: a list for tem files
49
+ --classdef (-c) FILE: a file for the defintion of environments (default: 'classdef.dat')
50
+ --outfile (-o) FILE: output filename (default 'allmat.dat')
51
+ --weight (-w) INTEGER: clustering level (PID) for the BLOSUM-like weighting (default: 60)
52
+ --noweight: calculate substitution count with no weights
53
+ --smooth (-s) INTEGER:
54
+ 0 for partial smoothing (default)
55
+ 1 for full smoothing
56
+ --p1smooth: perform smoothing for p1 probability calculation when partial smoothing
57
+ --nosmooth: perform no smoothing operation
58
+ --cys (-y) INTEGER:
59
+ 0 for using C and J only for structure (default)
60
+ 1 for both structure and sequence
61
+ 2 for using only C for both (must be set when you have no 'disulphide' or 'disulfide' annotation in templates)
62
+ --output INTEGER:
63
+ 0 for raw count (no smoothing performed)
64
+ 1 for probabilities
65
+ 2 for log odds ratios (default)
66
+ --noroundoff: do not round off log odds ratio
67
+ --scale INTEGER: log odds ratio matrices in 1/n bit units (default 3)
68
+ --sigma DOUBLE: change the sigma value for smoothing (default 5.0)
69
+ --autosigma: automatically adjust the sigma value for smoothing
70
+ --add DOUBLE: add this value to raw count when deriving log odds ratios without smoothing (default 1/#classes)
71
+ --pidmin DOUBLE: count substitutions only for pairs with PID equal to or greater than this value (default none)
72
+ --pidmax DOUBLE: count substitutions only for pairs with PID smaller than this value (default none)
73
+ --heatmap INTEGER:
74
+ 0 create a heat map file for each substitution table
75
+ 1 create one big file containing all heat maps from substitution tables
76
+ 2 do both 0 and 1
77
+ --heatmap-format INTEGER:
78
+ 0 for Portable Network Graphics (PNG) Format (default)
79
+ 1 for Graphics Interchange Format (GIF)
80
+ 2 for Joint Photographic Experts Group (JPEG) Format
81
+ 3 for Microsoft Windows bitmap (BMP) Format
82
+ 4 for Portable Document Format (PDF)
83
+ --heatmap-columns INTEGER: number of tables to print in a row when --heatmap 1 or 2 set (default: sqrt(no. of tables))
84
+ --heatmap-stem STRING: stem for a file name when --heatmap 1 or 2 set (default: 'heatmap')
85
+ --heatmap-value: print values in the cells when generating heat maps
86
+ --verbose (-v) INTEGER
87
+ 0 for ERROR level
88
+ 1 for WARN or above level (default)
89
+ 2 for INFO or above level
90
+ 3 for DEBUG or above level
91
+ --version: print version
92
+ --help (-h): show help
93
+
94
+
95
+ ## Usage
96
+
97
+ 1. Prepare an environmental class definition file. For more details, please check this [notes](http://www-cryst.bioc.cam.ac.uk/~kenji/subst/NOTES "Kenji's NOTES").
98
+
99
+ ~user $ cat classdef.dat
100
+ #
101
+ # name of feature (string); values adopted in .tem file (string); class labels assigned for each value (string);
102
+ # constrained or not (T or F); silent (used as masks)? (T or F)
103
+ #
104
+ secondary structure and phi angle;HEPC;HEPC;T;F
105
+ solvent accessibility;TF;Aa;F;F
106
+
107
+ 2. Prepare structural alignments and their annotations of above environmental classes in [PIR format](http://caps.ncbs.res.in/gendis/pir.html "PIR Format").
108
+
109
+ ~user $ cat sample1.tem
110
+ >P1;1mnma
111
+ sequence
112
+ QKERRKIEIKFIENKTRRHVTFSKRKHGIMKKAFELSVLTGTQVLLLVVSETGLVYTFSTPKFEPIVTQQEGRNL
113
+ IQACLNAPDD*
114
+ >P1;1egwa
115
+ sequence
116
+ --GRKKIQITRIMDERNRQVTFTKRKFGLMKKAYELSVLCDCEIALIIFNSSNKLFQYASTDMDKVLLKYTEY--
117
+ ----------*
118
+ >P1;1mnma
119
+ secondary structure and phi angle
120
+ CPCCCCCCCCCCCCHHHHHHHHHHHHHHHHHHHHHHHHHHPCCCEEEEECCCPCEEEEECCCCCHHHHCHHHHHH
121
+ HHHHHCCCCP*
122
+ >P1;1egwa
123
+ secondary structure and phi angle
124
+ --CCCCCCCCCCCCHHHHHHHHHHHHHHHHHHHHHHHHHCPCCCEEEEECCCPCEEEEECCCHHHHHHHHHHC--
125
+ ----------*
126
+ >P1;1mnma
127
+ solvent accessibility
128
+ TTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTFTTTTTTTTTTTTTTTT
129
+ TTTTTTTTTT*
130
+ >P1;1egwa
131
+ solvent accessibility
132
+ --TTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTFTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTT--
133
+ ----------*
134
+ ...
135
+
136
+ 3. When you have two or more alignment files, you should make a separate file containing all the paths for the alignment files.
137
+
138
+ ~user $ ls -1 *.tem > TEMLIST
139
+ ~user $ cat TEMLIST
140
+ sample1.tem
141
+ sample2.tem
142
+ ...
143
+
144
+ 4. To produce substitution count matrices, type
145
+
146
+ ~user $ egor -l TEMLIST --output 0 -o substcount.mat
147
+
148
+ 5. To produce substitution probability matrices, type
149
+
150
+ ~user $ egor -l TEMLIST --output 1 -o substprob.mat
151
+
152
+ 6. To produce log odds ratio matrices, type
153
+
154
+ ~user $ egor -l TEMLIST --output 2 -o substlogo.mat
155
+
156
+ 7. To produce substitution data only from the sequence pairs within a given PID range, type (if you don't provide any name for output, 'allmat.dat' will be used.)
157
+
158
+ ~user $ egor -l TEMLIST --pidmin 60 --pidmax 80 --output 1
159
+
160
+ 8. To change the clustering level (default 60), type
161
+
162
+ ~user $ egor -l TEMLIST --weight 80 --output 2
163
+
164
+ 9. In case any positions are masked with the character 'X' in any environmental features will be excluded from the calculation of substitution counts.
165
+
166
+ 10. Then, it will produce a file containing all the matrices, which will look like the one below. For more details, please check this [notes](http://www-cryst.bioc.cam.ac.uk/~kenji/subst/NOTES "Kenji's NOTES").
167
+
168
+ #
169
+ # Environment-specific amino acid substitution matrices
170
+ # Creator: egor version 0.0.4
171
+ # Creation Date: 20/01/2009 14:45
172
+ #
173
+ # Definitions for structural environments:
174
+ # 5 features used
175
+ #
176
+ # secondary structure and phi angle;HEPC;HEPC;F;F
177
+ # solvent accessibility;TF;Aa;F;F
178
+ # hydrogen bond to DNA;TF;Hh;F;F
179
+ # water-mediated hydrogen bond to DNA;TF;Ww;F;F
180
+ # van der Waals contact to DNA;TF;Vv;F;F
181
+ #
182
+ # (read in from classdef.dat)
183
+ #
184
+ # Number of alignments: 86
185
+ # (list of .tem files read in from TEMLIST)
186
+ #
187
+ # Total number of environments: 64
188
+ #
189
+ # There are 21 amino acids considered.
190
+ # ACDEFGHIKLMNPQRSTVWYJ
191
+ #
192
+ # C: Cystine (the disulfide-bonded form)
193
+ # J: Cysteine (the free thiol form)
194
+ #
195
+ # Weighting scheme: clustering at PID 60 level
196
+ #
197
+ # ...
198
+ #
199
+ >HAHWV 0
200
+ # A C D E F G H I K L M N P Q R S T V W Y J
201
+ A 5 -6 0 0 -2 0 -2 -1 -1 -1 1 -1 -1 0 -1 1 0 0 -2 -2 -2
202
+ C -7 28 -8 -49 -3 -49 -2 -1 -11 -5 -1 -49 -6 -49 -49 -4 -6 -4 -49 3 9
203
+ D 0 -7 7 2 -3 0 0 -4 0 -3 -3 2 0 0 -2 1 0 -3 -5 -3 -6
204
+ E 0 -68 2 5 -3 -1 -1 -3 0 -3 -1 0 0 1 -1 0 0 -3 -3 -2 -7
205
+ F -2 -3 -3 -4 8 -4 -1 2 -4 1 0 -4 -4 -4 -4 -4 -2 1 2 3 -5
206
+ G 0 -67 0 -1 -4 9 -3 -4 -2 -3 -4 1 -1 -2 -3 0 -2 -3 -3 -3 -2
207
+ H -2 -2 0 -1 -1 -3 11 -3 -2 -3 -2 0 -2 -1 -1 -1 -1 -3 -2 0 -4
208
+ I -1 -1 -4 -3 2 -4 -3 6 -3 2 2 -4 -2 -2 -4 -3 -1 3 -1 0 -4
209
+ K -1 -10 0 0 -4 -2 -1 -3 5 -3 -2 0 0 1 2 -1 -1 -3 -4 -2 -5
210
+ L -1 -5 -3 -3 1 -3 -3 2 -3 5 2 -4 -1 -2 -2 -3 -1 1 0 -1 -4
211
+ M 1 -1 -3 -1 0 -4 -2 2 -3 2 8 -2 -2 -1 -2 -2 -1 1 -1 -1 -4
212
+ N -1 -66 2 0 -4 1 0 -4 0 -4 -2 8 -1 0 -1 1 0 -3 -5 -4 -5
213
+ P -1 -6 0 0 -3 -1 -2 -2 -1 -1 -2 -1 9 -1 -2 0 0 -2 -4 -3 -7
214
+ Q 0 -66 0 1 -4 -2 -1 -2 1 -2 -1 0 -1 6 0 0 -1 -2 -2 -2 -6
215
+ R -1 -69 -1 0 -4 -3 -1 -3 2 -2 -1 -1 -2 0 6 -1 -1 -3 -3 -2 -6
216
+ S 1 -4 1 0 -3 0 -1 -3 -1 -3 -2 1 0 0 -1 5 2 -2 -3 -1 -3
217
+ T 0 -5 -1 -1 -2 -2 -1 -1 -1 -1 -1 0 0 -1 -1 2 5 -1 -3 -2 -3
218
+ V 0 -4 -3 -3 1 -4 -3 3 -3 1 1 -3 -2 -2 -3 -2 -1 6 0 -1 -2
219
+ W -2 -61 -5 -3 2 -3 -2 -1 -4 0 -1 -5 -4 -2 -3 -3 -3 0 14 3 -6
220
+ Y -2 3 -3 -2 4 -3 0 0 -2 0 0 -4 -3 -2 -2 -1 -2 0 3 9 -3
221
+ J -3 9 -7 -8 -5 -2 -4 -4 -6 -4 -4 -5 -7 -6 -6 -3 -3 -2 -6 -3 15
222
+ U -3 15 -7 -8 -5 -3 -4 -4 -6 -4 -4 -5 -7 -6 -6 -3 -3 -2 -6 -3 15
223
+ ...
224
+
225
+ ## Repository
226
+
227
+ You can download a pre-built RubyGems package from
228
+
229
+ * rubyforge: [http://rubyforge.org/projects/egor](http://rubyforge.org/projects/egor "RubyForge")
230
+
231
+ or, You can fetch the source from
232
+
233
+ * github: [http://github.com/semin/egor/tree/master](http://github.com/semin/egor/tree/master "GitHub")
234
+
235
+
236
+ ## Contact
237
+
238
+ Comments are welcome, please send an email to me (seminlee at gmail dot com).
239
+
240
+
241
+ ## License
242
+
243
+ <a rel="license" href="http://creativecommons.org/licenses/by-nc/2.0/uk/"><img alt="Creative Commons License" style="border-width:0" src="http://i.creativecommons.org/l/by-nc/2.0/uk/88x31.png" /></a><br />This work is licensed under a <a rel="license" href="http://creativecommons.org/licenses/by-nc/2.0/uk/">Creative Commons Attribution-Noncommercial 2.0 UK: England &amp; Wales License</a>.
data/egor.gemspec CHANGED
@@ -2,7 +2,7 @@
2
2
 
3
3
  Gem::Specification.new do |s|
4
4
  s.name = %q{egor}
5
- s.version = "0.9.1"
5
+ s.version = "0.9.2"
6
6
 
7
7
  s.required_rubygems_version = Gem::Requirement.new(">= 0") if s.respond_to? :required_rubygems_version=
8
8
  s.authors = ["Semin Lee"]
@@ -11,7 +11,7 @@ Gem::Specification.new do |s|
11
11
  s.email = ["seminlee@gmail.com"]
12
12
  s.executables = ["egor"]
13
13
  s.extra_rdoc_files = ["History.txt", "Manifest.txt", "PostInstall.txt", "website/index.txt"]
14
- s.files = ["History.txt", "Manifest.txt", "PostInstall.txt", "Rakefile", "bin/egor", "config/website.yml", "config/website.yml.sample", "egor.gemspec", "lib/egor.rb", "lib/egor/cli.rb", "lib/egor/environment.rb", "lib/egor/environment_class_hash.rb", "lib/egor/environment_feature.rb", "lib/egor/environment_feature_array.rb", "lib/egor/heatmap_array.rb", "lib/math_extensions.rb", "lib/narray_extensions.rb", "lib/nmatrix_extensions.rb", "lib/string_extensions.rb", "script/console", "script/destroy", "script/generate", "script/txt2html", "test/test_egor.rb", "test/test_egor_cli.rb", "test/test_egor_environment_class_hash.rb", "test/test_egor_environment_feature.rb", "test/test_helper.rb", "test/test_math_extensions.rb", "test/test_narray_extensions.rb", "test/test_nmatrix_extensions.rb", "test/test_string_extensions.rb", "website/index.html", "website/index.txt", "website/javascripts/rounded_corners_lite.inc.js", "website/stylesheets/screen.css", "website/template.html.erb"]
14
+ s.files = ["History.txt", "Manifest.txt", "PostInstall.txt", "README.markdown", "Rakefile", "bin/egor", "config/website.yml", "config/website.yml.sample", "egor.gemspec", "lib/egor.rb", "lib/egor/cli.rb", "lib/egor/environment.rb", "lib/egor/environment_class_hash.rb", "lib/egor/environment_feature.rb", "lib/egor/environment_feature_array.rb", "lib/egor/heatmap_array.rb", "lib/math_extensions.rb", "lib/narray_extensions.rb", "lib/nmatrix_extensions.rb", "lib/string_extensions.rb", "script/console", "script/destroy", "script/generate", "script/txt2html", "test/test_egor.rb", "test/test_egor_cli.rb", "test/test_egor_environment_class_hash.rb", "test/test_egor_environment_feature.rb", "test/test_helper.rb", "test/test_math_extensions.rb", "test/test_narray_extensions.rb", "test/test_nmatrix_extensions.rb", "test/test_string_extensions.rb", "website/index.html", "website/index.txt", "website/javascripts/rounded_corners_lite.inc.js", "website/stylesheets/screen.css", "website/template.html.erb"]
15
15
  s.has_rdoc = true
16
16
  s.post_install_message = %q{PostInstall.txt}
17
17
  s.rdoc_options = ["--main", "README.markdown"]
data/lib/egor/cli.rb CHANGED
@@ -72,7 +72,7 @@ Options:
72
72
  --pidmax DOUBLE: count substitutions only for pairs with PID smaller than this value (default none)
73
73
  --heatmap INTEGER:
74
74
  0 create a heat map file for each substitution table
75
- 1 create one big file containing all substitution tables
75
+ 1 create one big file containing all heat maps from substitution tables
76
76
  2 do both 0 and 1
77
77
  --heatmap-format INTEGER:
78
78
  0 for Portable Network Graphics (PNG) Format (default)
data/lib/egor.rb CHANGED
@@ -2,5 +2,5 @@ $:.unshift(File.dirname(__FILE__)) unless
2
2
  $:.include?(File.dirname(__FILE__)) || $:.include?(File.expand_path(File.dirname(__FILE__)))
3
3
 
4
4
  module Egor
5
- VERSION = '0.9.1'
5
+ VERSION = '0.9.2'
6
6
  end
metadata CHANGED
@@ -1,7 +1,7 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: semin-egor
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.9.1
4
+ version: 0.9.2
5
5
  platform: ruby
6
6
  authors:
7
7
  - Semin Lee
@@ -82,6 +82,7 @@ files:
82
82
  - History.txt
83
83
  - Manifest.txt
84
84
  - PostInstall.txt
85
+ - README.markdown
85
86
  - Rakefile
86
87
  - bin/egor
87
88
  - config/website.yml