semin-egor 0.9.1 → 0.9.2
Sign up to get free protection for your applications and to get access to all the features.
- data/README.markdown +243 -0
- data/egor.gemspec +2 -2
- data/lib/egor/cli.rb +1 -1
- data/lib/egor.rb +1 -1
- metadata +2 -1
data/README.markdown
ADDED
@@ -0,0 +1,243 @@
|
|
1
|
+
# egor
|
2
|
+
|
3
|
+
[http://www-cryst.bioc.cam.ac.uk/egor](http://www-cryst.bioc.cam.ac.uk/egor "Egor Homepage")
|
4
|
+
|
5
|
+
|
6
|
+
## Description
|
7
|
+
|
8
|
+
'egor' is a program for calculating environment-specific substitution tables from user providing environmental class definitions and sequence alignments with the annotations of the environment classes.
|
9
|
+
|
10
|
+
|
11
|
+
## Features
|
12
|
+
|
13
|
+
* Environment-specific substitution table generation based on user providing environmental class definition
|
14
|
+
* Entropy-based smoothing procedures to cope with sparse data problem
|
15
|
+
* BLOSUM-like weighting procedures using PID threshold
|
16
|
+
* Heat Map generation for substitution tables
|
17
|
+
|
18
|
+
|
19
|
+
## Requirements
|
20
|
+
|
21
|
+
* ruby 1.8.7 or above ([http://www.ruby-lang.org](http://www.ruby-lang.org "Ruby"))
|
22
|
+
* rubygems 1.2.0 or above ([http://rubyforge.org/projects/rubygems/](http://rubyforge.org/projects/rubygems "RubyGems"))
|
23
|
+
|
24
|
+
Following RubyGems will be automatically installed if you have rubygems installed on your machine
|
25
|
+
|
26
|
+
* narray ([http://narray.rubyforge.org](http://narray.rubyforge.org "NArray"))
|
27
|
+
* facets ([http://facets.rubyforge.org](http://facets.rubyforge.org "Ruby Facets"))
|
28
|
+
* bio ([http://bioruby.open-bio.org](http://bioruby.open-bio.org "BioRuby"))
|
29
|
+
* RMagick ([http://rmagick.rubyforge.org/](http://rmagick.rubyforge.org/ "RMagick"))
|
30
|
+
|
31
|
+
|
32
|
+
## Installation
|
33
|
+
|
34
|
+
~user $ sudo gem install egor
|
35
|
+
|
36
|
+
|
37
|
+
## Basic Usage
|
38
|
+
|
39
|
+
It's pretty much the same as Kenji's subst (http://www-cryst.bioc.cam.ac.uk/~kenji/subst/), so in most cases, you can swap 'subst' with 'egor'.
|
40
|
+
|
41
|
+
~user $ egor -l TEMLIST-file -c classdef.dat
|
42
|
+
or
|
43
|
+
~user $ egor -f TEM-file -c classdef.dat
|
44
|
+
|
45
|
+
|
46
|
+
## Options
|
47
|
+
--tem-file (-f) FILE: a tem file
|
48
|
+
--tem-list (-l) FILE: a list for tem files
|
49
|
+
--classdef (-c) FILE: a file for the defintion of environments (default: 'classdef.dat')
|
50
|
+
--outfile (-o) FILE: output filename (default 'allmat.dat')
|
51
|
+
--weight (-w) INTEGER: clustering level (PID) for the BLOSUM-like weighting (default: 60)
|
52
|
+
--noweight: calculate substitution count with no weights
|
53
|
+
--smooth (-s) INTEGER:
|
54
|
+
0 for partial smoothing (default)
|
55
|
+
1 for full smoothing
|
56
|
+
--p1smooth: perform smoothing for p1 probability calculation when partial smoothing
|
57
|
+
--nosmooth: perform no smoothing operation
|
58
|
+
--cys (-y) INTEGER:
|
59
|
+
0 for using C and J only for structure (default)
|
60
|
+
1 for both structure and sequence
|
61
|
+
2 for using only C for both (must be set when you have no 'disulphide' or 'disulfide' annotation in templates)
|
62
|
+
--output INTEGER:
|
63
|
+
0 for raw count (no smoothing performed)
|
64
|
+
1 for probabilities
|
65
|
+
2 for log odds ratios (default)
|
66
|
+
--noroundoff: do not round off log odds ratio
|
67
|
+
--scale INTEGER: log odds ratio matrices in 1/n bit units (default 3)
|
68
|
+
--sigma DOUBLE: change the sigma value for smoothing (default 5.0)
|
69
|
+
--autosigma: automatically adjust the sigma value for smoothing
|
70
|
+
--add DOUBLE: add this value to raw count when deriving log odds ratios without smoothing (default 1/#classes)
|
71
|
+
--pidmin DOUBLE: count substitutions only for pairs with PID equal to or greater than this value (default none)
|
72
|
+
--pidmax DOUBLE: count substitutions only for pairs with PID smaller than this value (default none)
|
73
|
+
--heatmap INTEGER:
|
74
|
+
0 create a heat map file for each substitution table
|
75
|
+
1 create one big file containing all heat maps from substitution tables
|
76
|
+
2 do both 0 and 1
|
77
|
+
--heatmap-format INTEGER:
|
78
|
+
0 for Portable Network Graphics (PNG) Format (default)
|
79
|
+
1 for Graphics Interchange Format (GIF)
|
80
|
+
2 for Joint Photographic Experts Group (JPEG) Format
|
81
|
+
3 for Microsoft Windows bitmap (BMP) Format
|
82
|
+
4 for Portable Document Format (PDF)
|
83
|
+
--heatmap-columns INTEGER: number of tables to print in a row when --heatmap 1 or 2 set (default: sqrt(no. of tables))
|
84
|
+
--heatmap-stem STRING: stem for a file name when --heatmap 1 or 2 set (default: 'heatmap')
|
85
|
+
--heatmap-value: print values in the cells when generating heat maps
|
86
|
+
--verbose (-v) INTEGER
|
87
|
+
0 for ERROR level
|
88
|
+
1 for WARN or above level (default)
|
89
|
+
2 for INFO or above level
|
90
|
+
3 for DEBUG or above level
|
91
|
+
--version: print version
|
92
|
+
--help (-h): show help
|
93
|
+
|
94
|
+
|
95
|
+
## Usage
|
96
|
+
|
97
|
+
1. Prepare an environmental class definition file. For more details, please check this [notes](http://www-cryst.bioc.cam.ac.uk/~kenji/subst/NOTES "Kenji's NOTES").
|
98
|
+
|
99
|
+
~user $ cat classdef.dat
|
100
|
+
#
|
101
|
+
# name of feature (string); values adopted in .tem file (string); class labels assigned for each value (string);
|
102
|
+
# constrained or not (T or F); silent (used as masks)? (T or F)
|
103
|
+
#
|
104
|
+
secondary structure and phi angle;HEPC;HEPC;T;F
|
105
|
+
solvent accessibility;TF;Aa;F;F
|
106
|
+
|
107
|
+
2. Prepare structural alignments and their annotations of above environmental classes in [PIR format](http://caps.ncbs.res.in/gendis/pir.html "PIR Format").
|
108
|
+
|
109
|
+
~user $ cat sample1.tem
|
110
|
+
>P1;1mnma
|
111
|
+
sequence
|
112
|
+
QKERRKIEIKFIENKTRRHVTFSKRKHGIMKKAFELSVLTGTQVLLLVVSETGLVYTFSTPKFEPIVTQQEGRNL
|
113
|
+
IQACLNAPDD*
|
114
|
+
>P1;1egwa
|
115
|
+
sequence
|
116
|
+
--GRKKIQITRIMDERNRQVTFTKRKFGLMKKAYELSVLCDCEIALIIFNSSNKLFQYASTDMDKVLLKYTEY--
|
117
|
+
----------*
|
118
|
+
>P1;1mnma
|
119
|
+
secondary structure and phi angle
|
120
|
+
CPCCCCCCCCCCCCHHHHHHHHHHHHHHHHHHHHHHHHHHPCCCEEEEECCCPCEEEEECCCCCHHHHCHHHHHH
|
121
|
+
HHHHHCCCCP*
|
122
|
+
>P1;1egwa
|
123
|
+
secondary structure and phi angle
|
124
|
+
--CCCCCCCCCCCCHHHHHHHHHHHHHHHHHHHHHHHHHCPCCCEEEEECCCPCEEEEECCCHHHHHHHHHHC--
|
125
|
+
----------*
|
126
|
+
>P1;1mnma
|
127
|
+
solvent accessibility
|
128
|
+
TTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTFTTTTTTTTTTTTTTTT
|
129
|
+
TTTTTTTTTT*
|
130
|
+
>P1;1egwa
|
131
|
+
solvent accessibility
|
132
|
+
--TTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTFTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTT--
|
133
|
+
----------*
|
134
|
+
...
|
135
|
+
|
136
|
+
3. When you have two or more alignment files, you should make a separate file containing all the paths for the alignment files.
|
137
|
+
|
138
|
+
~user $ ls -1 *.tem > TEMLIST
|
139
|
+
~user $ cat TEMLIST
|
140
|
+
sample1.tem
|
141
|
+
sample2.tem
|
142
|
+
...
|
143
|
+
|
144
|
+
4. To produce substitution count matrices, type
|
145
|
+
|
146
|
+
~user $ egor -l TEMLIST --output 0 -o substcount.mat
|
147
|
+
|
148
|
+
5. To produce substitution probability matrices, type
|
149
|
+
|
150
|
+
~user $ egor -l TEMLIST --output 1 -o substprob.mat
|
151
|
+
|
152
|
+
6. To produce log odds ratio matrices, type
|
153
|
+
|
154
|
+
~user $ egor -l TEMLIST --output 2 -o substlogo.mat
|
155
|
+
|
156
|
+
7. To produce substitution data only from the sequence pairs within a given PID range, type (if you don't provide any name for output, 'allmat.dat' will be used.)
|
157
|
+
|
158
|
+
~user $ egor -l TEMLIST --pidmin 60 --pidmax 80 --output 1
|
159
|
+
|
160
|
+
8. To change the clustering level (default 60), type
|
161
|
+
|
162
|
+
~user $ egor -l TEMLIST --weight 80 --output 2
|
163
|
+
|
164
|
+
9. In case any positions are masked with the character 'X' in any environmental features will be excluded from the calculation of substitution counts.
|
165
|
+
|
166
|
+
10. Then, it will produce a file containing all the matrices, which will look like the one below. For more details, please check this [notes](http://www-cryst.bioc.cam.ac.uk/~kenji/subst/NOTES "Kenji's NOTES").
|
167
|
+
|
168
|
+
#
|
169
|
+
# Environment-specific amino acid substitution matrices
|
170
|
+
# Creator: egor version 0.0.4
|
171
|
+
# Creation Date: 20/01/2009 14:45
|
172
|
+
#
|
173
|
+
# Definitions for structural environments:
|
174
|
+
# 5 features used
|
175
|
+
#
|
176
|
+
# secondary structure and phi angle;HEPC;HEPC;F;F
|
177
|
+
# solvent accessibility;TF;Aa;F;F
|
178
|
+
# hydrogen bond to DNA;TF;Hh;F;F
|
179
|
+
# water-mediated hydrogen bond to DNA;TF;Ww;F;F
|
180
|
+
# van der Waals contact to DNA;TF;Vv;F;F
|
181
|
+
#
|
182
|
+
# (read in from classdef.dat)
|
183
|
+
#
|
184
|
+
# Number of alignments: 86
|
185
|
+
# (list of .tem files read in from TEMLIST)
|
186
|
+
#
|
187
|
+
# Total number of environments: 64
|
188
|
+
#
|
189
|
+
# There are 21 amino acids considered.
|
190
|
+
# ACDEFGHIKLMNPQRSTVWYJ
|
191
|
+
#
|
192
|
+
# C: Cystine (the disulfide-bonded form)
|
193
|
+
# J: Cysteine (the free thiol form)
|
194
|
+
#
|
195
|
+
# Weighting scheme: clustering at PID 60 level
|
196
|
+
#
|
197
|
+
# ...
|
198
|
+
#
|
199
|
+
>HAHWV 0
|
200
|
+
# A C D E F G H I K L M N P Q R S T V W Y J
|
201
|
+
A 5 -6 0 0 -2 0 -2 -1 -1 -1 1 -1 -1 0 -1 1 0 0 -2 -2 -2
|
202
|
+
C -7 28 -8 -49 -3 -49 -2 -1 -11 -5 -1 -49 -6 -49 -49 -4 -6 -4 -49 3 9
|
203
|
+
D 0 -7 7 2 -3 0 0 -4 0 -3 -3 2 0 0 -2 1 0 -3 -5 -3 -6
|
204
|
+
E 0 -68 2 5 -3 -1 -1 -3 0 -3 -1 0 0 1 -1 0 0 -3 -3 -2 -7
|
205
|
+
F -2 -3 -3 -4 8 -4 -1 2 -4 1 0 -4 -4 -4 -4 -4 -2 1 2 3 -5
|
206
|
+
G 0 -67 0 -1 -4 9 -3 -4 -2 -3 -4 1 -1 -2 -3 0 -2 -3 -3 -3 -2
|
207
|
+
H -2 -2 0 -1 -1 -3 11 -3 -2 -3 -2 0 -2 -1 -1 -1 -1 -3 -2 0 -4
|
208
|
+
I -1 -1 -4 -3 2 -4 -3 6 -3 2 2 -4 -2 -2 -4 -3 -1 3 -1 0 -4
|
209
|
+
K -1 -10 0 0 -4 -2 -1 -3 5 -3 -2 0 0 1 2 -1 -1 -3 -4 -2 -5
|
210
|
+
L -1 -5 -3 -3 1 -3 -3 2 -3 5 2 -4 -1 -2 -2 -3 -1 1 0 -1 -4
|
211
|
+
M 1 -1 -3 -1 0 -4 -2 2 -3 2 8 -2 -2 -1 -2 -2 -1 1 -1 -1 -4
|
212
|
+
N -1 -66 2 0 -4 1 0 -4 0 -4 -2 8 -1 0 -1 1 0 -3 -5 -4 -5
|
213
|
+
P -1 -6 0 0 -3 -1 -2 -2 -1 -1 -2 -1 9 -1 -2 0 0 -2 -4 -3 -7
|
214
|
+
Q 0 -66 0 1 -4 -2 -1 -2 1 -2 -1 0 -1 6 0 0 -1 -2 -2 -2 -6
|
215
|
+
R -1 -69 -1 0 -4 -3 -1 -3 2 -2 -1 -1 -2 0 6 -1 -1 -3 -3 -2 -6
|
216
|
+
S 1 -4 1 0 -3 0 -1 -3 -1 -3 -2 1 0 0 -1 5 2 -2 -3 -1 -3
|
217
|
+
T 0 -5 -1 -1 -2 -2 -1 -1 -1 -1 -1 0 0 -1 -1 2 5 -1 -3 -2 -3
|
218
|
+
V 0 -4 -3 -3 1 -4 -3 3 -3 1 1 -3 -2 -2 -3 -2 -1 6 0 -1 -2
|
219
|
+
W -2 -61 -5 -3 2 -3 -2 -1 -4 0 -1 -5 -4 -2 -3 -3 -3 0 14 3 -6
|
220
|
+
Y -2 3 -3 -2 4 -3 0 0 -2 0 0 -4 -3 -2 -2 -1 -2 0 3 9 -3
|
221
|
+
J -3 9 -7 -8 -5 -2 -4 -4 -6 -4 -4 -5 -7 -6 -6 -3 -3 -2 -6 -3 15
|
222
|
+
U -3 15 -7 -8 -5 -3 -4 -4 -6 -4 -4 -5 -7 -6 -6 -3 -3 -2 -6 -3 15
|
223
|
+
...
|
224
|
+
|
225
|
+
## Repository
|
226
|
+
|
227
|
+
You can download a pre-built RubyGems package from
|
228
|
+
|
229
|
+
* rubyforge: [http://rubyforge.org/projects/egor](http://rubyforge.org/projects/egor "RubyForge")
|
230
|
+
|
231
|
+
or, You can fetch the source from
|
232
|
+
|
233
|
+
* github: [http://github.com/semin/egor/tree/master](http://github.com/semin/egor/tree/master "GitHub")
|
234
|
+
|
235
|
+
|
236
|
+
## Contact
|
237
|
+
|
238
|
+
Comments are welcome, please send an email to me (seminlee at gmail dot com).
|
239
|
+
|
240
|
+
|
241
|
+
## License
|
242
|
+
|
243
|
+
<a rel="license" href="http://creativecommons.org/licenses/by-nc/2.0/uk/"><img alt="Creative Commons License" style="border-width:0" src="http://i.creativecommons.org/l/by-nc/2.0/uk/88x31.png" /></a><br />This work is licensed under a <a rel="license" href="http://creativecommons.org/licenses/by-nc/2.0/uk/">Creative Commons Attribution-Noncommercial 2.0 UK: England & Wales License</a>.
|
data/egor.gemspec
CHANGED
@@ -2,7 +2,7 @@
|
|
2
2
|
|
3
3
|
Gem::Specification.new do |s|
|
4
4
|
s.name = %q{egor}
|
5
|
-
s.version = "0.9.
|
5
|
+
s.version = "0.9.2"
|
6
6
|
|
7
7
|
s.required_rubygems_version = Gem::Requirement.new(">= 0") if s.respond_to? :required_rubygems_version=
|
8
8
|
s.authors = ["Semin Lee"]
|
@@ -11,7 +11,7 @@ Gem::Specification.new do |s|
|
|
11
11
|
s.email = ["seminlee@gmail.com"]
|
12
12
|
s.executables = ["egor"]
|
13
13
|
s.extra_rdoc_files = ["History.txt", "Manifest.txt", "PostInstall.txt", "website/index.txt"]
|
14
|
-
s.files = ["History.txt", "Manifest.txt", "PostInstall.txt", "Rakefile", "bin/egor", "config/website.yml", "config/website.yml.sample", "egor.gemspec", "lib/egor.rb", "lib/egor/cli.rb", "lib/egor/environment.rb", "lib/egor/environment_class_hash.rb", "lib/egor/environment_feature.rb", "lib/egor/environment_feature_array.rb", "lib/egor/heatmap_array.rb", "lib/math_extensions.rb", "lib/narray_extensions.rb", "lib/nmatrix_extensions.rb", "lib/string_extensions.rb", "script/console", "script/destroy", "script/generate", "script/txt2html", "test/test_egor.rb", "test/test_egor_cli.rb", "test/test_egor_environment_class_hash.rb", "test/test_egor_environment_feature.rb", "test/test_helper.rb", "test/test_math_extensions.rb", "test/test_narray_extensions.rb", "test/test_nmatrix_extensions.rb", "test/test_string_extensions.rb", "website/index.html", "website/index.txt", "website/javascripts/rounded_corners_lite.inc.js", "website/stylesheets/screen.css", "website/template.html.erb"]
|
14
|
+
s.files = ["History.txt", "Manifest.txt", "PostInstall.txt", "README.markdown", "Rakefile", "bin/egor", "config/website.yml", "config/website.yml.sample", "egor.gemspec", "lib/egor.rb", "lib/egor/cli.rb", "lib/egor/environment.rb", "lib/egor/environment_class_hash.rb", "lib/egor/environment_feature.rb", "lib/egor/environment_feature_array.rb", "lib/egor/heatmap_array.rb", "lib/math_extensions.rb", "lib/narray_extensions.rb", "lib/nmatrix_extensions.rb", "lib/string_extensions.rb", "script/console", "script/destroy", "script/generate", "script/txt2html", "test/test_egor.rb", "test/test_egor_cli.rb", "test/test_egor_environment_class_hash.rb", "test/test_egor_environment_feature.rb", "test/test_helper.rb", "test/test_math_extensions.rb", "test/test_narray_extensions.rb", "test/test_nmatrix_extensions.rb", "test/test_string_extensions.rb", "website/index.html", "website/index.txt", "website/javascripts/rounded_corners_lite.inc.js", "website/stylesheets/screen.css", "website/template.html.erb"]
|
15
15
|
s.has_rdoc = true
|
16
16
|
s.post_install_message = %q{PostInstall.txt}
|
17
17
|
s.rdoc_options = ["--main", "README.markdown"]
|
data/lib/egor/cli.rb
CHANGED
@@ -72,7 +72,7 @@ Options:
|
|
72
72
|
--pidmax DOUBLE: count substitutions only for pairs with PID smaller than this value (default none)
|
73
73
|
--heatmap INTEGER:
|
74
74
|
0 create a heat map file for each substitution table
|
75
|
-
1 create one big file containing all substitution tables
|
75
|
+
1 create one big file containing all heat maps from substitution tables
|
76
76
|
2 do both 0 and 1
|
77
77
|
--heatmap-format INTEGER:
|
78
78
|
0 for Portable Network Graphics (PNG) Format (default)
|
data/lib/egor.rb
CHANGED
metadata
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
--- !ruby/object:Gem::Specification
|
2
2
|
name: semin-egor
|
3
3
|
version: !ruby/object:Gem::Version
|
4
|
-
version: 0.9.
|
4
|
+
version: 0.9.2
|
5
5
|
platform: ruby
|
6
6
|
authors:
|
7
7
|
- Semin Lee
|
@@ -82,6 +82,7 @@ files:
|
|
82
82
|
- History.txt
|
83
83
|
- Manifest.txt
|
84
84
|
- PostInstall.txt
|
85
|
+
- README.markdown
|
85
86
|
- Rakefile
|
86
87
|
- bin/egor
|
87
88
|
- config/website.yml
|