semin-egor 0.9.1 → 0.9.2

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
data/README.markdown ADDED
@@ -0,0 +1,243 @@
1
+ # egor
2
+
3
+ [http://www-cryst.bioc.cam.ac.uk/egor](http://www-cryst.bioc.cam.ac.uk/egor "Egor Homepage")
4
+
5
+
6
+ ## Description
7
+
8
+ 'egor' is a program for calculating environment-specific substitution tables from user providing environmental class definitions and sequence alignments with the annotations of the environment classes.
9
+
10
+
11
+ ## Features
12
+
13
+ * Environment-specific substitution table generation based on user providing environmental class definition
14
+ * Entropy-based smoothing procedures to cope with sparse data problem
15
+ * BLOSUM-like weighting procedures using PID threshold
16
+ * Heat Map generation for substitution tables
17
+
18
+
19
+ ## Requirements
20
+
21
+ * ruby 1.8.7 or above ([http://www.ruby-lang.org](http://www.ruby-lang.org "Ruby"))
22
+ * rubygems 1.2.0 or above ([http://rubyforge.org/projects/rubygems/](http://rubyforge.org/projects/rubygems "RubyGems"))
23
+
24
+ Following RubyGems will be automatically installed if you have rubygems installed on your machine
25
+
26
+ * narray ([http://narray.rubyforge.org](http://narray.rubyforge.org "NArray"))
27
+ * facets ([http://facets.rubyforge.org](http://facets.rubyforge.org "Ruby Facets"))
28
+ * bio ([http://bioruby.open-bio.org](http://bioruby.open-bio.org "BioRuby"))
29
+ * RMagick ([http://rmagick.rubyforge.org/](http://rmagick.rubyforge.org/ "RMagick"))
30
+
31
+
32
+ ## Installation
33
+
34
+ ~user $ sudo gem install egor
35
+
36
+
37
+ ## Basic Usage
38
+
39
+ It's pretty much the same as Kenji's subst (http://www-cryst.bioc.cam.ac.uk/~kenji/subst/), so in most cases, you can swap 'subst' with 'egor'.
40
+
41
+ ~user $ egor -l TEMLIST-file -c classdef.dat
42
+ or
43
+ ~user $ egor -f TEM-file -c classdef.dat
44
+
45
+
46
+ ## Options
47
+ --tem-file (-f) FILE: a tem file
48
+ --tem-list (-l) FILE: a list for tem files
49
+ --classdef (-c) FILE: a file for the defintion of environments (default: 'classdef.dat')
50
+ --outfile (-o) FILE: output filename (default 'allmat.dat')
51
+ --weight (-w) INTEGER: clustering level (PID) for the BLOSUM-like weighting (default: 60)
52
+ --noweight: calculate substitution count with no weights
53
+ --smooth (-s) INTEGER:
54
+ 0 for partial smoothing (default)
55
+ 1 for full smoothing
56
+ --p1smooth: perform smoothing for p1 probability calculation when partial smoothing
57
+ --nosmooth: perform no smoothing operation
58
+ --cys (-y) INTEGER:
59
+ 0 for using C and J only for structure (default)
60
+ 1 for both structure and sequence
61
+ 2 for using only C for both (must be set when you have no 'disulphide' or 'disulfide' annotation in templates)
62
+ --output INTEGER:
63
+ 0 for raw count (no smoothing performed)
64
+ 1 for probabilities
65
+ 2 for log odds ratios (default)
66
+ --noroundoff: do not round off log odds ratio
67
+ --scale INTEGER: log odds ratio matrices in 1/n bit units (default 3)
68
+ --sigma DOUBLE: change the sigma value for smoothing (default 5.0)
69
+ --autosigma: automatically adjust the sigma value for smoothing
70
+ --add DOUBLE: add this value to raw count when deriving log odds ratios without smoothing (default 1/#classes)
71
+ --pidmin DOUBLE: count substitutions only for pairs with PID equal to or greater than this value (default none)
72
+ --pidmax DOUBLE: count substitutions only for pairs with PID smaller than this value (default none)
73
+ --heatmap INTEGER:
74
+ 0 create a heat map file for each substitution table
75
+ 1 create one big file containing all heat maps from substitution tables
76
+ 2 do both 0 and 1
77
+ --heatmap-format INTEGER:
78
+ 0 for Portable Network Graphics (PNG) Format (default)
79
+ 1 for Graphics Interchange Format (GIF)
80
+ 2 for Joint Photographic Experts Group (JPEG) Format
81
+ 3 for Microsoft Windows bitmap (BMP) Format
82
+ 4 for Portable Document Format (PDF)
83
+ --heatmap-columns INTEGER: number of tables to print in a row when --heatmap 1 or 2 set (default: sqrt(no. of tables))
84
+ --heatmap-stem STRING: stem for a file name when --heatmap 1 or 2 set (default: 'heatmap')
85
+ --heatmap-value: print values in the cells when generating heat maps
86
+ --verbose (-v) INTEGER
87
+ 0 for ERROR level
88
+ 1 for WARN or above level (default)
89
+ 2 for INFO or above level
90
+ 3 for DEBUG or above level
91
+ --version: print version
92
+ --help (-h): show help
93
+
94
+
95
+ ## Usage
96
+
97
+ 1. Prepare an environmental class definition file. For more details, please check this [notes](http://www-cryst.bioc.cam.ac.uk/~kenji/subst/NOTES "Kenji's NOTES").
98
+
99
+ ~user $ cat classdef.dat
100
+ #
101
+ # name of feature (string); values adopted in .tem file (string); class labels assigned for each value (string);
102
+ # constrained or not (T or F); silent (used as masks)? (T or F)
103
+ #
104
+ secondary structure and phi angle;HEPC;HEPC;T;F
105
+ solvent accessibility;TF;Aa;F;F
106
+
107
+ 2. Prepare structural alignments and their annotations of above environmental classes in [PIR format](http://caps.ncbs.res.in/gendis/pir.html "PIR Format").
108
+
109
+ ~user $ cat sample1.tem
110
+ >P1;1mnma
111
+ sequence
112
+ QKERRKIEIKFIENKTRRHVTFSKRKHGIMKKAFELSVLTGTQVLLLVVSETGLVYTFSTPKFEPIVTQQEGRNL
113
+ IQACLNAPDD*
114
+ >P1;1egwa
115
+ sequence
116
+ --GRKKIQITRIMDERNRQVTFTKRKFGLMKKAYELSVLCDCEIALIIFNSSNKLFQYASTDMDKVLLKYTEY--
117
+ ----------*
118
+ >P1;1mnma
119
+ secondary structure and phi angle
120
+ CPCCCCCCCCCCCCHHHHHHHHHHHHHHHHHHHHHHHHHHPCCCEEEEECCCPCEEEEECCCCCHHHHCHHHHHH
121
+ HHHHHCCCCP*
122
+ >P1;1egwa
123
+ secondary structure and phi angle
124
+ --CCCCCCCCCCCCHHHHHHHHHHHHHHHHHHHHHHHHHCPCCCEEEEECCCPCEEEEECCCHHHHHHHHHHC--
125
+ ----------*
126
+ >P1;1mnma
127
+ solvent accessibility
128
+ TTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTFTTTTTTTTTTTTTTTT
129
+ TTTTTTTTTT*
130
+ >P1;1egwa
131
+ solvent accessibility
132
+ --TTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTFTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTT--
133
+ ----------*
134
+ ...
135
+
136
+ 3. When you have two or more alignment files, you should make a separate file containing all the paths for the alignment files.
137
+
138
+ ~user $ ls -1 *.tem > TEMLIST
139
+ ~user $ cat TEMLIST
140
+ sample1.tem
141
+ sample2.tem
142
+ ...
143
+
144
+ 4. To produce substitution count matrices, type
145
+
146
+ ~user $ egor -l TEMLIST --output 0 -o substcount.mat
147
+
148
+ 5. To produce substitution probability matrices, type
149
+
150
+ ~user $ egor -l TEMLIST --output 1 -o substprob.mat
151
+
152
+ 6. To produce log odds ratio matrices, type
153
+
154
+ ~user $ egor -l TEMLIST --output 2 -o substlogo.mat
155
+
156
+ 7. To produce substitution data only from the sequence pairs within a given PID range, type (if you don't provide any name for output, 'allmat.dat' will be used.)
157
+
158
+ ~user $ egor -l TEMLIST --pidmin 60 --pidmax 80 --output 1
159
+
160
+ 8. To change the clustering level (default 60), type
161
+
162
+ ~user $ egor -l TEMLIST --weight 80 --output 2
163
+
164
+ 9. In case any positions are masked with the character 'X' in any environmental features will be excluded from the calculation of substitution counts.
165
+
166
+ 10. Then, it will produce a file containing all the matrices, which will look like the one below. For more details, please check this [notes](http://www-cryst.bioc.cam.ac.uk/~kenji/subst/NOTES "Kenji's NOTES").
167
+
168
+ #
169
+ # Environment-specific amino acid substitution matrices
170
+ # Creator: egor version 0.0.4
171
+ # Creation Date: 20/01/2009 14:45
172
+ #
173
+ # Definitions for structural environments:
174
+ # 5 features used
175
+ #
176
+ # secondary structure and phi angle;HEPC;HEPC;F;F
177
+ # solvent accessibility;TF;Aa;F;F
178
+ # hydrogen bond to DNA;TF;Hh;F;F
179
+ # water-mediated hydrogen bond to DNA;TF;Ww;F;F
180
+ # van der Waals contact to DNA;TF;Vv;F;F
181
+ #
182
+ # (read in from classdef.dat)
183
+ #
184
+ # Number of alignments: 86
185
+ # (list of .tem files read in from TEMLIST)
186
+ #
187
+ # Total number of environments: 64
188
+ #
189
+ # There are 21 amino acids considered.
190
+ # ACDEFGHIKLMNPQRSTVWYJ
191
+ #
192
+ # C: Cystine (the disulfide-bonded form)
193
+ # J: Cysteine (the free thiol form)
194
+ #
195
+ # Weighting scheme: clustering at PID 60 level
196
+ #
197
+ # ...
198
+ #
199
+ >HAHWV 0
200
+ # A C D E F G H I K L M N P Q R S T V W Y J
201
+ A 5 -6 0 0 -2 0 -2 -1 -1 -1 1 -1 -1 0 -1 1 0 0 -2 -2 -2
202
+ C -7 28 -8 -49 -3 -49 -2 -1 -11 -5 -1 -49 -6 -49 -49 -4 -6 -4 -49 3 9
203
+ D 0 -7 7 2 -3 0 0 -4 0 -3 -3 2 0 0 -2 1 0 -3 -5 -3 -6
204
+ E 0 -68 2 5 -3 -1 -1 -3 0 -3 -1 0 0 1 -1 0 0 -3 -3 -2 -7
205
+ F -2 -3 -3 -4 8 -4 -1 2 -4 1 0 -4 -4 -4 -4 -4 -2 1 2 3 -5
206
+ G 0 -67 0 -1 -4 9 -3 -4 -2 -3 -4 1 -1 -2 -3 0 -2 -3 -3 -3 -2
207
+ H -2 -2 0 -1 -1 -3 11 -3 -2 -3 -2 0 -2 -1 -1 -1 -1 -3 -2 0 -4
208
+ I -1 -1 -4 -3 2 -4 -3 6 -3 2 2 -4 -2 -2 -4 -3 -1 3 -1 0 -4
209
+ K -1 -10 0 0 -4 -2 -1 -3 5 -3 -2 0 0 1 2 -1 -1 -3 -4 -2 -5
210
+ L -1 -5 -3 -3 1 -3 -3 2 -3 5 2 -4 -1 -2 -2 -3 -1 1 0 -1 -4
211
+ M 1 -1 -3 -1 0 -4 -2 2 -3 2 8 -2 -2 -1 -2 -2 -1 1 -1 -1 -4
212
+ N -1 -66 2 0 -4 1 0 -4 0 -4 -2 8 -1 0 -1 1 0 -3 -5 -4 -5
213
+ P -1 -6 0 0 -3 -1 -2 -2 -1 -1 -2 -1 9 -1 -2 0 0 -2 -4 -3 -7
214
+ Q 0 -66 0 1 -4 -2 -1 -2 1 -2 -1 0 -1 6 0 0 -1 -2 -2 -2 -6
215
+ R -1 -69 -1 0 -4 -3 -1 -3 2 -2 -1 -1 -2 0 6 -1 -1 -3 -3 -2 -6
216
+ S 1 -4 1 0 -3 0 -1 -3 -1 -3 -2 1 0 0 -1 5 2 -2 -3 -1 -3
217
+ T 0 -5 -1 -1 -2 -2 -1 -1 -1 -1 -1 0 0 -1 -1 2 5 -1 -3 -2 -3
218
+ V 0 -4 -3 -3 1 -4 -3 3 -3 1 1 -3 -2 -2 -3 -2 -1 6 0 -1 -2
219
+ W -2 -61 -5 -3 2 -3 -2 -1 -4 0 -1 -5 -4 -2 -3 -3 -3 0 14 3 -6
220
+ Y -2 3 -3 -2 4 -3 0 0 -2 0 0 -4 -3 -2 -2 -1 -2 0 3 9 -3
221
+ J -3 9 -7 -8 -5 -2 -4 -4 -6 -4 -4 -5 -7 -6 -6 -3 -3 -2 -6 -3 15
222
+ U -3 15 -7 -8 -5 -3 -4 -4 -6 -4 -4 -5 -7 -6 -6 -3 -3 -2 -6 -3 15
223
+ ...
224
+
225
+ ## Repository
226
+
227
+ You can download a pre-built RubyGems package from
228
+
229
+ * rubyforge: [http://rubyforge.org/projects/egor](http://rubyforge.org/projects/egor "RubyForge")
230
+
231
+ or, You can fetch the source from
232
+
233
+ * github: [http://github.com/semin/egor/tree/master](http://github.com/semin/egor/tree/master "GitHub")
234
+
235
+
236
+ ## Contact
237
+
238
+ Comments are welcome, please send an email to me (seminlee at gmail dot com).
239
+
240
+
241
+ ## License
242
+
243
+ <a rel="license" href="http://creativecommons.org/licenses/by-nc/2.0/uk/"><img alt="Creative Commons License" style="border-width:0" src="http://i.creativecommons.org/l/by-nc/2.0/uk/88x31.png" /></a><br />This work is licensed under a <a rel="license" href="http://creativecommons.org/licenses/by-nc/2.0/uk/">Creative Commons Attribution-Noncommercial 2.0 UK: England &amp; Wales License</a>.
data/egor.gemspec CHANGED
@@ -2,7 +2,7 @@
2
2
 
3
3
  Gem::Specification.new do |s|
4
4
  s.name = %q{egor}
5
- s.version = "0.9.1"
5
+ s.version = "0.9.2"
6
6
 
7
7
  s.required_rubygems_version = Gem::Requirement.new(">= 0") if s.respond_to? :required_rubygems_version=
8
8
  s.authors = ["Semin Lee"]
@@ -11,7 +11,7 @@ Gem::Specification.new do |s|
11
11
  s.email = ["seminlee@gmail.com"]
12
12
  s.executables = ["egor"]
13
13
  s.extra_rdoc_files = ["History.txt", "Manifest.txt", "PostInstall.txt", "website/index.txt"]
14
- s.files = ["History.txt", "Manifest.txt", "PostInstall.txt", "Rakefile", "bin/egor", "config/website.yml", "config/website.yml.sample", "egor.gemspec", "lib/egor.rb", "lib/egor/cli.rb", "lib/egor/environment.rb", "lib/egor/environment_class_hash.rb", "lib/egor/environment_feature.rb", "lib/egor/environment_feature_array.rb", "lib/egor/heatmap_array.rb", "lib/math_extensions.rb", "lib/narray_extensions.rb", "lib/nmatrix_extensions.rb", "lib/string_extensions.rb", "script/console", "script/destroy", "script/generate", "script/txt2html", "test/test_egor.rb", "test/test_egor_cli.rb", "test/test_egor_environment_class_hash.rb", "test/test_egor_environment_feature.rb", "test/test_helper.rb", "test/test_math_extensions.rb", "test/test_narray_extensions.rb", "test/test_nmatrix_extensions.rb", "test/test_string_extensions.rb", "website/index.html", "website/index.txt", "website/javascripts/rounded_corners_lite.inc.js", "website/stylesheets/screen.css", "website/template.html.erb"]
14
+ s.files = ["History.txt", "Manifest.txt", "PostInstall.txt", "README.markdown", "Rakefile", "bin/egor", "config/website.yml", "config/website.yml.sample", "egor.gemspec", "lib/egor.rb", "lib/egor/cli.rb", "lib/egor/environment.rb", "lib/egor/environment_class_hash.rb", "lib/egor/environment_feature.rb", "lib/egor/environment_feature_array.rb", "lib/egor/heatmap_array.rb", "lib/math_extensions.rb", "lib/narray_extensions.rb", "lib/nmatrix_extensions.rb", "lib/string_extensions.rb", "script/console", "script/destroy", "script/generate", "script/txt2html", "test/test_egor.rb", "test/test_egor_cli.rb", "test/test_egor_environment_class_hash.rb", "test/test_egor_environment_feature.rb", "test/test_helper.rb", "test/test_math_extensions.rb", "test/test_narray_extensions.rb", "test/test_nmatrix_extensions.rb", "test/test_string_extensions.rb", "website/index.html", "website/index.txt", "website/javascripts/rounded_corners_lite.inc.js", "website/stylesheets/screen.css", "website/template.html.erb"]
15
15
  s.has_rdoc = true
16
16
  s.post_install_message = %q{PostInstall.txt}
17
17
  s.rdoc_options = ["--main", "README.markdown"]
data/lib/egor/cli.rb CHANGED
@@ -72,7 +72,7 @@ Options:
72
72
  --pidmax DOUBLE: count substitutions only for pairs with PID smaller than this value (default none)
73
73
  --heatmap INTEGER:
74
74
  0 create a heat map file for each substitution table
75
- 1 create one big file containing all substitution tables
75
+ 1 create one big file containing all heat maps from substitution tables
76
76
  2 do both 0 and 1
77
77
  --heatmap-format INTEGER:
78
78
  0 for Portable Network Graphics (PNG) Format (default)
data/lib/egor.rb CHANGED
@@ -2,5 +2,5 @@ $:.unshift(File.dirname(__FILE__)) unless
2
2
  $:.include?(File.dirname(__FILE__)) || $:.include?(File.expand_path(File.dirname(__FILE__)))
3
3
 
4
4
  module Egor
5
- VERSION = '0.9.1'
5
+ VERSION = '0.9.2'
6
6
  end
metadata CHANGED
@@ -1,7 +1,7 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: semin-egor
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.9.1
4
+ version: 0.9.2
5
5
  platform: ruby
6
6
  authors:
7
7
  - Semin Lee
@@ -82,6 +82,7 @@ files:
82
82
  - History.txt
83
83
  - Manifest.txt
84
84
  - PostInstall.txt
85
+ - README.markdown
85
86
  - Rakefile
86
87
  - bin/egor
87
88
  - config/website.yml