bio 0.7.1 → 1.0.0
Sign up to get free protection for your applications and to get access to all the features.
- data/bin/bioruby +71 -27
- data/bin/br_biofetch.rb +5 -17
- data/bin/br_bioflat.rb +14 -26
- data/bin/br_biogetseq.rb +6 -18
- data/bin/br_pmfetch.rb +6 -16
- data/doc/Changes-0.7.rd +35 -0
- data/doc/KEGG_API.rd +287 -172
- data/doc/KEGG_API.rd.ja +273 -160
- data/doc/Tutorial.rd +18 -9
- data/doc/Tutorial.rd.ja +656 -138
- data/lib/bio.rb +6 -24
- data/lib/bio/alignment.rb +5 -5
- data/lib/bio/appl/blast.rb +132 -98
- data/lib/bio/appl/blast/format0.rb +9 -19
- data/lib/bio/appl/blast/wublast.rb +5 -18
- data/lib/bio/appl/emboss.rb +40 -47
- data/lib/bio/appl/hmmer.rb +116 -82
- data/lib/bio/appl/hmmer/report.rb +509 -364
- data/lib/bio/appl/spidey/report.rb +7 -18
- data/lib/bio/data/na.rb +3 -21
- data/lib/bio/db.rb +3 -21
- data/lib/bio/db/aaindex.rb +147 -52
- data/lib/bio/db/embl/common.rb +27 -6
- data/lib/bio/db/embl/embl.rb +18 -10
- data/lib/bio/db/embl/sptr.rb +87 -67
- data/lib/bio/db/embl/swissprot.rb +32 -3
- data/lib/bio/db/embl/trembl.rb +32 -3
- data/lib/bio/db/embl/uniprot.rb +32 -3
- data/lib/bio/db/fasta.rb +327 -289
- data/lib/bio/db/medline.rb +25 -4
- data/lib/bio/db/nbrf.rb +12 -20
- data/lib/bio/db/pdb.rb +4 -1
- data/lib/bio/db/pdb/chemicalcomponent.rb +240 -0
- data/lib/bio/db/pdb/pdb.rb +13 -8
- data/lib/bio/db/rebase.rb +93 -97
- data/lib/bio/feature.rb +2 -31
- data/lib/bio/io/ddbjxml.rb +167 -139
- data/lib/bio/io/fastacmd.rb +89 -56
- data/lib/bio/io/flatfile.rb +994 -278
- data/lib/bio/io/flatfile/index.rb +257 -194
- data/lib/bio/io/flatfile/indexer.rb +37 -29
- data/lib/bio/reference.rb +147 -64
- data/lib/bio/sequence.rb +57 -417
- data/lib/bio/sequence/aa.rb +64 -0
- data/lib/bio/sequence/common.rb +175 -0
- data/lib/bio/sequence/compat.rb +68 -0
- data/lib/bio/sequence/format.rb +134 -0
- data/lib/bio/sequence/generic.rb +24 -0
- data/lib/bio/sequence/na.rb +189 -0
- data/lib/bio/shell.rb +9 -23
- data/lib/bio/shell/core.rb +130 -125
- data/lib/bio/shell/demo.rb +143 -0
- data/lib/bio/shell/{session.rb → interface.rb} +42 -40
- data/lib/bio/shell/object.rb +52 -0
- data/lib/bio/shell/plugin/codon.rb +4 -22
- data/lib/bio/shell/plugin/emboss.rb +23 -0
- data/lib/bio/shell/plugin/entry.rb +34 -25
- data/lib/bio/shell/plugin/flatfile.rb +5 -23
- data/lib/bio/shell/plugin/keggapi.rb +11 -24
- data/lib/bio/shell/plugin/midi.rb +5 -23
- data/lib/bio/shell/plugin/obda.rb +4 -22
- data/lib/bio/shell/plugin/seq.rb +6 -24
- data/lib/bio/shell/rails/Rakefile +10 -0
- data/lib/bio/shell/rails/app/controllers/application.rb +4 -0
- data/lib/bio/shell/rails/app/controllers/shell_controller.rb +94 -0
- data/lib/bio/shell/rails/app/helpers/application_helper.rb +3 -0
- data/lib/bio/shell/rails/app/models/shell_connection.rb +30 -0
- data/lib/bio/shell/rails/app/views/layouts/shell.rhtml +37 -0
- data/lib/bio/shell/rails/app/views/shell/history.rhtml +5 -0
- data/lib/bio/shell/rails/app/views/shell/index.rhtml +2 -0
- data/lib/bio/shell/rails/app/views/shell/show.rhtml +13 -0
- data/lib/bio/shell/rails/config/boot.rb +19 -0
- data/lib/bio/shell/rails/config/database.yml +85 -0
- data/lib/bio/shell/rails/config/environment.rb +53 -0
- data/lib/bio/shell/rails/config/environments/development.rb +19 -0
- data/lib/bio/shell/rails/config/environments/production.rb +19 -0
- data/lib/bio/shell/rails/config/environments/test.rb +19 -0
- data/lib/bio/shell/rails/config/routes.rb +19 -0
- data/lib/bio/shell/rails/doc/README_FOR_APP +2 -0
- data/lib/bio/shell/rails/public/404.html +8 -0
- data/lib/bio/shell/rails/public/500.html +8 -0
- data/lib/bio/shell/rails/public/dispatch.cgi +10 -0
- data/lib/bio/shell/rails/public/dispatch.fcgi +24 -0
- data/lib/bio/shell/rails/public/dispatch.rb +10 -0
- data/lib/bio/shell/rails/public/favicon.ico +0 -0
- data/lib/bio/shell/rails/public/images/icon.png +0 -0
- data/lib/bio/shell/rails/public/images/rails.png +0 -0
- data/lib/bio/shell/rails/public/index.html +277 -0
- data/lib/bio/shell/rails/public/javascripts/controls.js +750 -0
- data/lib/bio/shell/rails/public/javascripts/dragdrop.js +584 -0
- data/lib/bio/shell/rails/public/javascripts/effects.js +854 -0
- data/lib/bio/shell/rails/public/javascripts/prototype.js +1785 -0
- data/lib/bio/shell/rails/public/robots.txt +1 -0
- data/lib/bio/shell/rails/public/stylesheets/main.css +187 -0
- data/lib/bio/shell/rails/script/about +3 -0
- data/lib/bio/shell/rails/script/breakpointer +3 -0
- data/lib/bio/shell/rails/script/console +3 -0
- data/lib/bio/shell/rails/script/destroy +3 -0
- data/lib/bio/shell/rails/script/generate +3 -0
- data/lib/bio/shell/rails/script/performance/benchmarker +3 -0
- data/lib/bio/shell/rails/script/performance/profiler +3 -0
- data/lib/bio/shell/rails/script/plugin +3 -0
- data/lib/bio/shell/rails/script/process/reaper +3 -0
- data/lib/bio/shell/rails/script/process/spawner +3 -0
- data/lib/bio/shell/rails/script/process/spinner +3 -0
- data/lib/bio/shell/rails/script/runner +3 -0
- data/lib/bio/shell/rails/script/server +42 -0
- data/lib/bio/shell/rails/test/test_helper.rb +28 -0
- data/lib/bio/shell/web.rb +90 -0
- data/lib/bio/util/contingency_table.rb +231 -225
- data/sample/any2fasta.rb +59 -0
- data/test/data/HMMER/hmmpfam.out +64 -0
- data/test/data/HMMER/hmmsearch.out +88 -0
- data/test/data/aaindex/DAYM780301 +30 -0
- data/test/data/aaindex/PRAM900102 +20 -0
- data/test/data/bl2seq/cd8a_cd8b_blastp.bl2seq +53 -0
- data/test/data/bl2seq/cd8a_p53_e-5blastp.bl2seq +37 -0
- data/test/data/blast/{eco:b0002.faa → b0002.faa} +0 -0
- data/test/data/blast/{eco:b0002.faa.m0 → b0002.faa.m0} +2 -2
- data/test/data/blast/{eco:b0002.faa.m7 → b0002.faa.m7} +1 -1
- data/test/data/blast/{eco:b0002.faa.m8 → b0002.faa.m8} +0 -0
- data/test/unit/bio/appl/bl2seq/test_report.rb +134 -0
- data/test/unit/bio/appl/blast/test_report.rb +15 -12
- data/test/unit/bio/appl/blast/test_xmlparser.rb +4 -4
- data/test/unit/bio/appl/hmmer/test_report.rb +355 -0
- data/test/unit/bio/appl/test_blast.rb +5 -5
- data/test/unit/bio/data/test_na.rb +9 -18
- data/test/unit/bio/db/pdb/test_pdb.rb +169 -0
- data/test/unit/bio/db/test_aaindex.rb +197 -0
- data/test/unit/bio/io/test_fastacmd.rb +55 -0
- data/test/unit/bio/sequence/test_aa.rb +102 -0
- data/test/unit/bio/sequence/test_common.rb +178 -0
- data/test/unit/bio/sequence/test_compat.rb +82 -0
- data/test/unit/bio/sequence/test_na.rb +242 -0
- data/test/unit/bio/shell/plugin/test_seq.rb +29 -19
- data/test/unit/bio/test_alignment.rb +15 -7
- data/test/unit/bio/test_reference.rb +198 -0
- data/test/unit/bio/test_sequence.rb +4 -49
- data/test/unit/bio/test_shell.rb +2 -2
- metadata +118 -15
- data/lib/bio/io/brdb.rb +0 -103
- data/lib/bioruby.rb +0 -34
@@ -0,0 +1 @@
|
|
1
|
+
# See http://www.robotstxt.org/wc/norobots.html for documentation on how to use the robots.txt file
|
@@ -0,0 +1,187 @@
|
|
1
|
+
body { background-color: #fff; color: #333; }
|
2
|
+
|
3
|
+
body, p, td {
|
4
|
+
font-family: verdana, arial, helvetica, sans-serif;
|
5
|
+
font-size: 13px;
|
6
|
+
line-height: 18px;
|
7
|
+
}
|
8
|
+
|
9
|
+
pre {
|
10
|
+
background-color: #eee;
|
11
|
+
padding: 10px;
|
12
|
+
font-size: 11px;
|
13
|
+
}
|
14
|
+
|
15
|
+
a { color: #000; }
|
16
|
+
a:visited { color: #666; }
|
17
|
+
a:hover { color: #fff; background-color:#000; }
|
18
|
+
|
19
|
+
.fieldWithErrors {
|
20
|
+
padding: 2px;
|
21
|
+
background-color: red;
|
22
|
+
display: table;
|
23
|
+
}
|
24
|
+
|
25
|
+
table {
|
26
|
+
text-align: top;
|
27
|
+
}
|
28
|
+
|
29
|
+
#ErrorExplanation {
|
30
|
+
width: 400px;
|
31
|
+
border: 2px solid red;
|
32
|
+
padding: 7px;
|
33
|
+
padding-bottom: 12px;
|
34
|
+
margin-bottom: 20px;
|
35
|
+
background-color: #f0f0f0;
|
36
|
+
}
|
37
|
+
|
38
|
+
#ErrorExplanation h2 {
|
39
|
+
text-align: left;
|
40
|
+
font-weight: bold;
|
41
|
+
padding: 5px 5px 5px 15px;
|
42
|
+
font-size: 12px;
|
43
|
+
margin: -7px;
|
44
|
+
background-color: #c00;
|
45
|
+
color: #fff;
|
46
|
+
}
|
47
|
+
|
48
|
+
#ErrorExplanation p {
|
49
|
+
color: #333;
|
50
|
+
margin-bottom: 0;
|
51
|
+
padding: 5px;
|
52
|
+
}
|
53
|
+
|
54
|
+
#ErrorExplanation ul li {
|
55
|
+
font-size: 12px;
|
56
|
+
list-style: square;
|
57
|
+
}
|
58
|
+
|
59
|
+
h1{
|
60
|
+
color: #333;
|
61
|
+
padding: 10px;
|
62
|
+
margin: 12px;
|
63
|
+
}
|
64
|
+
|
65
|
+
#banner{
|
66
|
+
margin-left: 5em;
|
67
|
+
margin-right: -6px;
|
68
|
+
text-align: center;
|
69
|
+
font-size: 30px;
|
70
|
+
border-top: 1px solid silver;
|
71
|
+
border-bottom: 1px solid silver;
|
72
|
+
padding: 10px 0px 10px 0px;
|
73
|
+
}
|
74
|
+
|
75
|
+
tr{
|
76
|
+
vertical-align: top;
|
77
|
+
text-align: left;
|
78
|
+
}
|
79
|
+
|
80
|
+
#side img{
|
81
|
+
background-color: black;
|
82
|
+
width:12em;
|
83
|
+
}
|
84
|
+
|
85
|
+
#side{
|
86
|
+
position: absolute;
|
87
|
+
margin: -13px;
|
88
|
+
top: 1em;
|
89
|
+
left: 1em;
|
90
|
+
width: 10em;
|
91
|
+
}
|
92
|
+
|
93
|
+
#side ul{
|
94
|
+
font-size: 13px;
|
95
|
+
margin: 0em;
|
96
|
+
}
|
97
|
+
|
98
|
+
#side h2{
|
99
|
+
font-size: 12px;
|
100
|
+
text-align: center;
|
101
|
+
width: 13em;
|
102
|
+
background-color: black;
|
103
|
+
color: white;
|
104
|
+
}
|
105
|
+
#side input{
|
106
|
+
text-align: center;
|
107
|
+
}
|
108
|
+
|
109
|
+
#main {
|
110
|
+
margin-left: 10em;
|
111
|
+
padding-top: 4x;
|
112
|
+
padding-left: 2em;
|
113
|
+
background: white;
|
114
|
+
}
|
115
|
+
|
116
|
+
.main {
|
117
|
+
margin-left: 10em;
|
118
|
+
padding-top: 4x;
|
119
|
+
padding-left: 2em;
|
120
|
+
background: white;
|
121
|
+
}
|
122
|
+
|
123
|
+
#menu {
|
124
|
+
margin-left: 10em;
|
125
|
+
padding-top: 4x;
|
126
|
+
padding-left: 2em;
|
127
|
+
background: white;
|
128
|
+
}
|
129
|
+
|
130
|
+
div.uploadStatus {
|
131
|
+
margin: 5px;
|
132
|
+
}
|
133
|
+
|
134
|
+
div.progressBar {
|
135
|
+
margin: 5px;
|
136
|
+
}
|
137
|
+
|
138
|
+
div.progressBar div.border {
|
139
|
+
background-color: #fff;
|
140
|
+
border: 1px solid grey;
|
141
|
+
width: 100%;
|
142
|
+
}
|
143
|
+
|
144
|
+
div.progressBar div.background {
|
145
|
+
background-color: #333;
|
146
|
+
height: 18px;
|
147
|
+
width: 0%;
|
148
|
+
}
|
149
|
+
|
150
|
+
.tabs {
|
151
|
+
position:relative;
|
152
|
+
height: 20px;
|
153
|
+
margin: 0;
|
154
|
+
padding: 0;
|
155
|
+
background: #aaa repeat-x;
|
156
|
+
overflow:hidden
|
157
|
+
}
|
158
|
+
|
159
|
+
.tabs li {
|
160
|
+
display:inline;
|
161
|
+
}
|
162
|
+
|
163
|
+
.tabs a:hover, .tabs a.tab-active {
|
164
|
+
color:#333;
|
165
|
+
background:#fff url("bar_on.gif") repeat-x;
|
166
|
+
border-right: 1px solid #fff
|
167
|
+
}
|
168
|
+
|
169
|
+
|
170
|
+
.tabs a {
|
171
|
+
height: 27px;
|
172
|
+
font:12px verdana, helvetica, sans-serif;
|
173
|
+
font-weight:bold;
|
174
|
+
position:relative;
|
175
|
+
padding:6px 10px 10px 10px;
|
176
|
+
margin: 0px -4px 0px 0px;
|
177
|
+
color:#333;
|
178
|
+
text-decoration:none;
|
179
|
+
border-left:1px solid #fff;
|
180
|
+
border-right:1px solid #333;
|
181
|
+
}
|
182
|
+
.tab-container {
|
183
|
+
background: #fff;
|
184
|
+
border:1px solid #555;
|
185
|
+
}
|
186
|
+
.tab-panes {
|
187
|
+
margin: 3px }
|
@@ -0,0 +1,42 @@
|
|
1
|
+
#!/usr/bin/env ruby
|
2
|
+
#
|
3
|
+
# = BioRuby shell on Rails server - GUI for the BioRuby shell
|
4
|
+
#
|
5
|
+
# Copyright:: Copyright (C) 2006
|
6
|
+
# Nobuya Tanaka <t@chemruby.org>,
|
7
|
+
# Toshiaki Katayama <k@bioruby.org>
|
8
|
+
# License:: Ruby's
|
9
|
+
#
|
10
|
+
# $Id: server,v 1.1 2006/02/27 11:16:23 k Exp $
|
11
|
+
#
|
12
|
+
|
13
|
+
require 'bio/shell'
|
14
|
+
require 'drb/drb'
|
15
|
+
|
16
|
+
require './app/models/shell_connection'
|
17
|
+
$drb_server = ShellConnection.new
|
18
|
+
|
19
|
+
## Access Control List
|
20
|
+
#
|
21
|
+
# require 'drb/acl'
|
22
|
+
#
|
23
|
+
# list = %w(deny all
|
24
|
+
# allow 127.0.0.1
|
25
|
+
# )
|
26
|
+
# acl = ACL.new(list, ACL::DENY_ALLOW)
|
27
|
+
# DRb.install_acl(acl)
|
28
|
+
#
|
29
|
+
|
30
|
+
STDOUT.sync = true
|
31
|
+
|
32
|
+
#uri = "druby://localhost:0"
|
33
|
+
uri = 'druby://localhost:81064' # baioroji-
|
34
|
+
DRb.start_service(uri, $drb_server)
|
35
|
+
puts DRb.uri
|
36
|
+
|
37
|
+
puts "starting ..."
|
38
|
+
|
39
|
+
require './config/boot'
|
40
|
+
require 'commands/server'
|
41
|
+
|
42
|
+
puts "exiting ..."
|
@@ -0,0 +1,28 @@
|
|
1
|
+
ENV["RAILS_ENV"] = "test"
|
2
|
+
require File.expand_path(File.dirname(__FILE__) + "/../config/environment")
|
3
|
+
require 'test_help'
|
4
|
+
|
5
|
+
class Test::Unit::TestCase
|
6
|
+
# Transactional fixtures accelerate your tests by wrapping each test method
|
7
|
+
# in a transaction that's rolled back on completion. This ensures that the
|
8
|
+
# test database remains unchanged so your fixtures don't have to be reloaded
|
9
|
+
# between every test method. Fewer database queries means faster tests.
|
10
|
+
#
|
11
|
+
# Read Mike Clark's excellent walkthrough at
|
12
|
+
# http://clarkware.com/cgi/blosxom/2005/10/24#Rails10FastTesting
|
13
|
+
#
|
14
|
+
# Every Active Record database supports transactions except MyISAM tables
|
15
|
+
# in MySQL. Turn off transactional fixtures in this case; however, if you
|
16
|
+
# don't care one way or the other, switching from MyISAM to InnoDB tables
|
17
|
+
# is recommended.
|
18
|
+
self.use_transactional_fixtures = true
|
19
|
+
|
20
|
+
# Instantiated fixtures are slow, but give you @david where otherwise you
|
21
|
+
# would need people(:david). If you don't want to migrate your existing
|
22
|
+
# test cases which use the @david style and don't mind the speed hit (each
|
23
|
+
# instantiated fixtures translates to a database query per test method),
|
24
|
+
# then set this back to true.
|
25
|
+
self.use_instantiated_fixtures = false
|
26
|
+
|
27
|
+
# Add more helper methods to be used by all tests here...
|
28
|
+
end
|
@@ -0,0 +1,90 @@
|
|
1
|
+
#
|
2
|
+
# = bio/shell/web.rb - GUI for the BioRuby shell
|
3
|
+
#
|
4
|
+
# Copyright:: Copyright (C) 2006
|
5
|
+
# Nobuya Tanaka <t@chemruby.org>,
|
6
|
+
# Toshiaki Katayama <k@bioruby.org>
|
7
|
+
# License:: Ruby's
|
8
|
+
#
|
9
|
+
# $Id: web.rb,v 1.1 2006/02/27 09:22:42 k Exp $
|
10
|
+
#
|
11
|
+
|
12
|
+
|
13
|
+
module Bio::Shell
|
14
|
+
|
15
|
+
private
|
16
|
+
|
17
|
+
def rails_directory_setup
|
18
|
+
server = "script/server"
|
19
|
+
unless File.exists?(server)
|
20
|
+
require 'fileutils'
|
21
|
+
basedir = File.dirname(__FILE__)
|
22
|
+
print "Copying web server files ... "
|
23
|
+
FileUtils.cp_r("#{basedir}/rails/.", ".")
|
24
|
+
puts "done"
|
25
|
+
end
|
26
|
+
end
|
27
|
+
|
28
|
+
def rails_server_setup
|
29
|
+
require 'open3'
|
30
|
+
$web_server = Open3.popen3(server)
|
31
|
+
|
32
|
+
$web_error_log = File.open("log/web-error.log", "a")
|
33
|
+
$web_server[2].reopen($web_error_log)
|
34
|
+
|
35
|
+
while line = $web_server[1].gets
|
36
|
+
if line[/druby:\/\/localhost/]
|
37
|
+
uri = line.chomp
|
38
|
+
puts uri if $DEBUG
|
39
|
+
break
|
40
|
+
end
|
41
|
+
end
|
42
|
+
|
43
|
+
$web_access_log = File.open("log/web-access.log", "a")
|
44
|
+
$web_server[1].reopen($web_access_log)
|
45
|
+
|
46
|
+
return uri
|
47
|
+
end
|
48
|
+
|
49
|
+
def web
|
50
|
+
return if $web_server
|
51
|
+
|
52
|
+
require 'drb/drb'
|
53
|
+
# $SAFE = 1 # disable eval() and friends
|
54
|
+
|
55
|
+
rails_directory_setup
|
56
|
+
#uri = rails_server_setup
|
57
|
+
uri = 'druby://localhost:81064' # baioroji-
|
58
|
+
|
59
|
+
$drb_server = DRbObject.new_with_uri(uri)
|
60
|
+
$drb_server.puts_remote("Connected")
|
61
|
+
|
62
|
+
puts "Connected to server #{uri}"
|
63
|
+
puts "Open http://localhost:3000/shell/"
|
64
|
+
|
65
|
+
io = IRB.conf[:MAIN_CONTEXT].io
|
66
|
+
|
67
|
+
io.class.class_eval do
|
68
|
+
alias_method :shell_original_gets, :gets
|
69
|
+
end
|
70
|
+
|
71
|
+
def io.gets
|
72
|
+
bind = IRB.conf[:MAIN_CONTEXT].workspace.binding
|
73
|
+
vars = eval("local_variables", bind)
|
74
|
+
vars.each do |var|
|
75
|
+
next if var == "_"
|
76
|
+
if val = eval("#{var}", bind)
|
77
|
+
$drb_server[var] = val
|
78
|
+
else
|
79
|
+
$drb_server.delete(var)
|
80
|
+
end
|
81
|
+
end
|
82
|
+
line = shell_original_gets
|
83
|
+
line
|
84
|
+
end
|
85
|
+
end
|
86
|
+
|
87
|
+
end
|
88
|
+
|
89
|
+
|
90
|
+
|
@@ -1,14 +1,236 @@
|
|
1
|
-
module Bio
|
2
|
-
|
3
1
|
#
|
4
|
-
# bio/util/contingency_table.rb - Statistical contingency table analysis for aligned sequences
|
2
|
+
# = bio/util/contingency_table.rb - Statistical contingency table analysis for aligned sequences
|
5
3
|
#
|
6
4
|
# Copyright:: Copyright (C) 2005 Trevor Wennblom <trevor@corevx.com>
|
7
5
|
# License:: LGPL
|
8
6
|
#
|
9
|
-
# $Id: contingency_table.rb,v 1.
|
10
|
-
#
|
7
|
+
# $Id: contingency_table.rb,v 1.4 2006/02/27 13:23:01 k Exp $
|
11
8
|
#
|
9
|
+
# == Synopsis
|
10
|
+
#
|
11
|
+
# The Bio::ContingencyTable class provides basic statistical contingency table
|
12
|
+
# analysis for two positions within aligned sequences.
|
13
|
+
#
|
14
|
+
# When ContingencyTable is instantiated the set of characters in the
|
15
|
+
# aligned sequences may be passed to it as an array. This is
|
16
|
+
# important since it uses these characters to create the table's rows
|
17
|
+
# and columns. If this array is not passed it will use it's default
|
18
|
+
# of an amino acid and nucleotide alphabet in lowercase along with the
|
19
|
+
# clustal spacer '-'.
|
20
|
+
#
|
21
|
+
# To get data from the table the most used functions will be
|
22
|
+
# chi_square and contingency_coefficient:
|
23
|
+
#
|
24
|
+
# ctable = Bio::ContingencyTable.new()
|
25
|
+
# ctable['a']['t'] += 1
|
26
|
+
# # .. put more values into the table
|
27
|
+
# puts ctable.chi_square
|
28
|
+
# puts ctable.contingency_coefficient # between 0.0 and 1.0
|
29
|
+
#
|
30
|
+
# The contingency_coefficient represents the degree of correlation of
|
31
|
+
# change between two sequence positions in a multiple-sequence
|
32
|
+
# alignment. 0.0 indicates no correlation, 1.0 is the maximum
|
33
|
+
# correlation.
|
34
|
+
#
|
35
|
+
#
|
36
|
+
# == Further Reading
|
37
|
+
#
|
38
|
+
# * http://en.wikipedia.org/wiki/Contingency_table
|
39
|
+
# * http://www.physics.csbsju.edu/stats/exact.details.html
|
40
|
+
# * Numerical Recipes in C by Press, Flannery, Teukolsky, and Vetterling
|
41
|
+
# #
|
42
|
+
# == Usage
|
43
|
+
#
|
44
|
+
# What follows is an example of ContingencyTable in typical usage
|
45
|
+
# analyzing results from a clustal alignment.
|
46
|
+
#
|
47
|
+
# require 'bio'
|
48
|
+
# require 'bio/contingency_table'
|
49
|
+
#
|
50
|
+
# seqs = {}
|
51
|
+
# max_length = 0
|
52
|
+
# Bio::ClustalW::Report.new( IO.read('sample.aln') ).to_a.each do |entry|
|
53
|
+
# data = entry.data.strip
|
54
|
+
# seqs[entry.definition] = data.downcase
|
55
|
+
# max_length = data.size if max_length == 0
|
56
|
+
# raise "Aligned sequences must be the same length!" unless data.size == max_length
|
57
|
+
# end
|
58
|
+
#
|
59
|
+
# VERBOSE = true
|
60
|
+
# puts "i\tj\tchi_square\tcontingency_coefficient" if VERBOSE
|
61
|
+
# correlations = {}
|
62
|
+
#
|
63
|
+
# 0.upto(max_length - 1) do |i|
|
64
|
+
# (i+1).upto(max_length - 1) do |j|
|
65
|
+
# ctable = Bio::ContingencyTable.new()
|
66
|
+
# seqs.each_value { |seq| ctable.table[ seq[i].chr ][ seq[j].chr ] += 1 }
|
67
|
+
#
|
68
|
+
# chi_square = ctable.chi_square
|
69
|
+
# contingency_coefficient = ctable.contingency_coefficient
|
70
|
+
# puts [(i+1), (j+1), chi_square, contingency_coefficient].join("\t") if VERBOSE
|
71
|
+
#
|
72
|
+
# correlations["#{i+1},#{j+1}"] = contingency_coefficient
|
73
|
+
# correlations["#{j+1},#{i+1}"] = contingency_coefficient # Both ways are accurate
|
74
|
+
# end
|
75
|
+
# end
|
76
|
+
#
|
77
|
+
# require 'yaml'
|
78
|
+
# File.new('results.yml', 'a+') { |f| f.puts correlations.to_yaml }
|
79
|
+
#
|
80
|
+
#
|
81
|
+
# == Tutorial
|
82
|
+
#
|
83
|
+
|
84
|
+
# ContingencyTable returns the statistical significance of change
|
85
|
+
# between two positions in an alignment. If you would like to see how
|
86
|
+
# every possible combination of positions in your alignment compares
|
87
|
+
# to one another you must set this up yourself. Hopefully the
|
88
|
+
# provided examples will help you get started without too much
|
89
|
+
# trouble.
|
90
|
+
#
|
91
|
+
# def lite_example(sequences, max_length, characters)
|
92
|
+
#
|
93
|
+
# %w{i j chi_square contingency_coefficient}.each { |x| print x.ljust(12) }
|
94
|
+
# puts
|
95
|
+
#
|
96
|
+
# 0.upto(max_length - 1) do |i|
|
97
|
+
# (i+1).upto(max_length - 1) do |j|
|
98
|
+
# ctable = Bio::ContingencyTable.new( characters )
|
99
|
+
# sequences.each do |seq|
|
100
|
+
# i_char = seq[i].chr
|
101
|
+
# j_char = seq[j].chr
|
102
|
+
# ctable.table[i_char][j_char] += 1
|
103
|
+
# end
|
104
|
+
# chi_square = ctable.chi_square
|
105
|
+
# contingency_coefficient = ctable.contingency_coefficient
|
106
|
+
# [(i+1), (j+1), chi_square, contingency_coefficient].each { |x| print x.to_s.ljust(12) }
|
107
|
+
# puts
|
108
|
+
# end
|
109
|
+
# end
|
110
|
+
#
|
111
|
+
# end
|
112
|
+
#
|
113
|
+
# allowed_letters = Array.new
|
114
|
+
# allowed_letters = 'abcdefghijk'.split('')
|
115
|
+
#
|
116
|
+
# seqs = Array.new
|
117
|
+
# seqs << 'abcde'
|
118
|
+
# seqs << 'abcde'
|
119
|
+
# seqs << 'aacje'
|
120
|
+
# seqs << 'aacae'
|
121
|
+
#
|
122
|
+
# length_of_every_sequence = seqs[0].size # 5 letters long
|
123
|
+
#
|
124
|
+
# lite_example(seqs, length_of_every_sequence, allowed_letters)
|
125
|
+
#
|
126
|
+
#
|
127
|
+
# Producing the following results:
|
128
|
+
#
|
129
|
+
# i j chi_square contingency_coefficient
|
130
|
+
# 1 2 0.0 0.0
|
131
|
+
# 1 3 0.0 0.0
|
132
|
+
# 1 4 0.0 0.0
|
133
|
+
# 1 5 0.0 0.0
|
134
|
+
# 2 3 0.0 0.0
|
135
|
+
# 2 4 4.0 0.707106781186548
|
136
|
+
# 2 5 0.0 0.0
|
137
|
+
# 3 4 0.0 0.0
|
138
|
+
# 3 5 0.0 0.0
|
139
|
+
# 4 5 0.0 0.0
|
140
|
+
#
|
141
|
+
# The position i=2 and j=4 has a high contingency coefficient
|
142
|
+
# indicating that the changes at these positions are related. Note
|
143
|
+
# that i and j are arbitrary, this could be represented as i=4 and j=2
|
144
|
+
# since they both refer to position two and position four in the
|
145
|
+
# alignment. Here are some more examples:
|
146
|
+
#
|
147
|
+
# seqs = Array.new
|
148
|
+
# seqs << 'abcde'
|
149
|
+
# seqs << 'abcde'
|
150
|
+
# seqs << 'aacje'
|
151
|
+
# seqs << 'aacae'
|
152
|
+
# seqs << 'akcfe'
|
153
|
+
# seqs << 'akcfe'
|
154
|
+
#
|
155
|
+
# length_of_every_sequence = seqs[0].size # 5 letters long
|
156
|
+
#
|
157
|
+
# lite_example(seqs, length_of_every_sequence, allowed_letters)
|
158
|
+
#
|
159
|
+
#
|
160
|
+
# Results:
|
161
|
+
#
|
162
|
+
# i j chi_square contingency_coefficient
|
163
|
+
# 1 2 0.0 0.0
|
164
|
+
# 1 3 0.0 0.0
|
165
|
+
# 1 4 0.0 0.0
|
166
|
+
# 1 5 0.0 0.0
|
167
|
+
# 2 3 0.0 0.0
|
168
|
+
# 2 4 12.0 0.816496580927726
|
169
|
+
# 2 5 0.0 0.0
|
170
|
+
# 3 4 0.0 0.0
|
171
|
+
# 3 5 0.0 0.0
|
172
|
+
# 4 5 0.0 0.0
|
173
|
+
#
|
174
|
+
# Here we can see that the strength of the correlation of change has
|
175
|
+
# increased when more data is added with correlated changes at the
|
176
|
+
# same positions.
|
177
|
+
#
|
178
|
+
# seqs = Array.new
|
179
|
+
# seqs << 'abcde'
|
180
|
+
# seqs << 'abcde'
|
181
|
+
# seqs << 'kacje' # changed first letter
|
182
|
+
# seqs << 'aacae'
|
183
|
+
# seqs << 'akcfa' # changed last letter
|
184
|
+
# seqs << 'akcfe'
|
185
|
+
#
|
186
|
+
# length_of_every_sequence = seqs[0].size # 5 letters long
|
187
|
+
#
|
188
|
+
# lite_example(seqs, length_of_every_sequence, allowed_letters)
|
189
|
+
#
|
190
|
+
#
|
191
|
+
# Results:
|
192
|
+
#
|
193
|
+
# i j chi_square contingency_coefficient
|
194
|
+
# 1 2 2.4 0.534522483824849
|
195
|
+
# 1 3 0.0 0.0
|
196
|
+
# 1 4 6.0 0.707106781186548
|
197
|
+
# 1 5 0.24 0.196116135138184
|
198
|
+
# 2 3 0.0 0.0
|
199
|
+
# 2 4 12.0 0.816496580927726
|
200
|
+
# 2 5 2.4 0.534522483824849
|
201
|
+
# 3 4 0.0 0.0
|
202
|
+
# 3 5 0.0 0.0
|
203
|
+
# 4 5 2.4 0.534522483824849
|
204
|
+
#
|
205
|
+
# With random changes it becomes more difficult to identify correlated
|
206
|
+
# changes, yet positions two and four still have the highest
|
207
|
+
# correlation as indicated by the contingency coefficient. The best
|
208
|
+
# way to improve the accuracy of your results, as is often the case
|
209
|
+
# with statistics, is to increase the sample size.
|
210
|
+
#
|
211
|
+
#
|
212
|
+
# == A Note on Efficiency
|
213
|
+
#
|
214
|
+
|
215
|
+
# ContingencyTable is slow. It involves many calculations for even a
|
216
|
+
# seemingly small five-string data set. Even worse, it's very
|
217
|
+
# dependent on matrix traversal, and this is done with two dimensional
|
218
|
+
# hashes which dashes any hope of decent speed.
|
219
|
+
#
|
220
|
+
|
221
|
+
# Finally, half of the matrix is redundant and positions could be
|
222
|
+
# summed with their companion position to reduce calculations. For
|
223
|
+
# example the positions (5,2) and (2,5) could both have their values
|
224
|
+
# added together and just stored in (2,5) while (5,2) could be an
|
225
|
+
# illegal position. Also, positions (1,1), (2,2), (3,3), etc. will
|
226
|
+
# never be used.
|
227
|
+
#
|
228
|
+
# The purpose of this package is flexibility and education. The code
|
229
|
+
# is short and to the point in aims of achieving that purpose. If the
|
230
|
+
# BioRuby project moves towards C extensions in the future a
|
231
|
+
# professional caliber version will likely be created.
|
232
|
+
#
|
233
|
+
#
|
12
234
|
#--
|
13
235
|
#
|
14
236
|
# This library is free software; you can redistribute it and/or
|
@@ -29,225 +251,7 @@ module Bio
|
|
29
251
|
#
|
30
252
|
#
|
31
253
|
|
32
|
-
|
33
|
-
bio/util/contingency_table.rb - Statistical contingency table analysis for aligned sequences
|
34
|
-
|
35
|
-
== Synopsis
|
36
|
-
|
37
|
-
The Bio::ContingencyTable class provides basic statistical contingency table
|
38
|
-
analysis for two positions within aligned sequences.
|
39
|
-
|
40
|
-
When ContingencyTable is instantiated the set of characters in the aligned sequences may be
|
41
|
-
passed to it as an array. This is important since it uses these characters
|
42
|
-
to create the table's rows and columns. If this array is not passed it will
|
43
|
-
use it's default of an amino acid and nucleotide alphabet in lowercase along with the
|
44
|
-
clustal spacer '-'.
|
45
|
-
|
46
|
-
To get data from the table the most used functions will be chi_square and contingency_coefficient:
|
47
|
-
ctable = Bio::ContingencyTable.new()
|
48
|
-
ctable['a']['t'] += 1
|
49
|
-
# .. put more values into the table
|
50
|
-
puts ctable.chi_square
|
51
|
-
puts ctable.contingency_coefficient # between 0.0 and 1.0
|
52
|
-
|
53
|
-
The contingency_coefficient represents the degree of correlation of change between two
|
54
|
-
sequence positions in a multiple-sequence alignment. 0.0 indicates no correlation, 1.0 is the
|
55
|
-
maximum correlation.
|
56
|
-
|
57
|
-
|
58
|
-
== Further Reading
|
59
|
-
|
60
|
-
* http://en.wikipedia.org/wiki/Contingency_table
|
61
|
-
* http://www.physics.csbsju.edu/stats/exact.details.html
|
62
|
-
* Numerical Recipes in C by Press, Flannery, Teukolsky, and Vetterling
|
63
|
-
|
64
|
-
|
65
|
-
== Usage
|
66
|
-
|
67
|
-
What follows is an example of ContingencyTable in typical usage analyzing results from a clustal alignment.
|
68
|
-
|
69
|
-
require 'bio'
|
70
|
-
require 'bio/contingency_table'
|
71
|
-
|
72
|
-
seqs = {}
|
73
|
-
max_length = 0
|
74
|
-
Bio::ClustalW::Report.new( IO.read('sample.aln') ).to_a.each do |entry|
|
75
|
-
data = entry.data.strip
|
76
|
-
seqs[entry.definition] = data.downcase
|
77
|
-
max_length = data.size if max_length == 0
|
78
|
-
raise "Aligned sequences must be the same length!" unless data.size == max_length
|
79
|
-
end
|
80
|
-
|
81
|
-
VERBOSE = true
|
82
|
-
puts "i\tj\tchi_square\tcontingency_coefficient" if VERBOSE
|
83
|
-
correlations = {}
|
84
|
-
|
85
|
-
0.upto(max_length - 1) do |i|
|
86
|
-
(i+1).upto(max_length - 1) do |j|
|
87
|
-
ctable = Bio::ContingencyTable.new()
|
88
|
-
seqs.each_value { |seq| ctable.table[ seq[i].chr ][ seq[j].chr ] += 1 }
|
89
|
-
|
90
|
-
chi_square = ctable.chi_square
|
91
|
-
contingency_coefficient = ctable.contingency_coefficient
|
92
|
-
puts [(i+1), (j+1), chi_square, contingency_coefficient].join("\t") if VERBOSE
|
93
|
-
|
94
|
-
correlations["#{i+1},#{j+1}"] = contingency_coefficient
|
95
|
-
correlations["#{j+1},#{i+1}"] = contingency_coefficient # Both ways are accurate
|
96
|
-
end
|
97
|
-
end
|
98
|
-
|
99
|
-
require 'yaml'
|
100
|
-
File.new('results.yml', 'a+') { |f| f.puts correlations.to_yaml }
|
101
|
-
|
102
|
-
|
103
|
-
== Tutorial
|
104
|
-
|
105
|
-
ContingencyTable returns the statistical significance of change between two positions in an alignment.
|
106
|
-
If you would like to see how every possible combination of positions in your alignment compares to one another
|
107
|
-
you must set this up yourself. Hopefully the provided examples will help you get started without
|
108
|
-
too much trouble.
|
109
|
-
|
110
|
-
def lite_example(sequences, max_length, characters)
|
111
|
-
|
112
|
-
%w{i j chi_square contingency_coefficient}.each { |x| print x.ljust(12) }
|
113
|
-
puts
|
114
|
-
|
115
|
-
0.upto(max_length - 1) do |i|
|
116
|
-
(i+1).upto(max_length - 1) do |j|
|
117
|
-
ctable = Bio::ContingencyTable.new( characters )
|
118
|
-
sequences.each do |seq|
|
119
|
-
i_char = seq[i].chr
|
120
|
-
j_char = seq[j].chr
|
121
|
-
ctable.table[i_char][j_char] += 1
|
122
|
-
end
|
123
|
-
chi_square = ctable.chi_square
|
124
|
-
contingency_coefficient = ctable.contingency_coefficient
|
125
|
-
[(i+1), (j+1), chi_square, contingency_coefficient].each { |x| print x.to_s.ljust(12) }
|
126
|
-
puts
|
127
|
-
end
|
128
|
-
end
|
129
|
-
|
130
|
-
end
|
131
|
-
|
132
|
-
allowed_letters = Array.new
|
133
|
-
allowed_letters = 'abcdefghijk'.split('')
|
134
|
-
|
135
|
-
seqs = Array.new
|
136
|
-
seqs << 'abcde'
|
137
|
-
seqs << 'abcde'
|
138
|
-
seqs << 'aacje'
|
139
|
-
seqs << 'aacae'
|
140
|
-
|
141
|
-
length_of_every_sequence = seqs[0].size # 5 letters long
|
142
|
-
|
143
|
-
lite_example(seqs, length_of_every_sequence, allowed_letters)
|
144
|
-
|
145
|
-
|
146
|
-
Producing the following results:
|
147
|
-
|
148
|
-
i j chi_square contingency_coefficient
|
149
|
-
1 2 0.0 0.0
|
150
|
-
1 3 0.0 0.0
|
151
|
-
1 4 0.0 0.0
|
152
|
-
1 5 0.0 0.0
|
153
|
-
2 3 0.0 0.0
|
154
|
-
2 4 4.0 0.707106781186548
|
155
|
-
2 5 0.0 0.0
|
156
|
-
3 4 0.0 0.0
|
157
|
-
3 5 0.0 0.0
|
158
|
-
4 5 0.0 0.0
|
159
|
-
|
160
|
-
The position i=2 and j=4 has a high contingency coefficient indicating that the changes at these
|
161
|
-
positions are related. Note that i and j are arbitrary, this could be represented as i=4 and j=2
|
162
|
-
since they both refer to position two and position four in the alignment. Here are some more examples:
|
163
|
-
|
164
|
-
seqs = Array.new
|
165
|
-
seqs << 'abcde'
|
166
|
-
seqs << 'abcde'
|
167
|
-
seqs << 'aacje'
|
168
|
-
seqs << 'aacae'
|
169
|
-
seqs << 'akcfe'
|
170
|
-
seqs << 'akcfe'
|
171
|
-
|
172
|
-
length_of_every_sequence = seqs[0].size # 5 letters long
|
173
|
-
|
174
|
-
lite_example(seqs, length_of_every_sequence, allowed_letters)
|
175
|
-
|
176
|
-
|
177
|
-
Results:
|
178
|
-
|
179
|
-
i j chi_square contingency_coefficient
|
180
|
-
1 2 0.0 0.0
|
181
|
-
1 3 0.0 0.0
|
182
|
-
1 4 0.0 0.0
|
183
|
-
1 5 0.0 0.0
|
184
|
-
2 3 0.0 0.0
|
185
|
-
2 4 12.0 0.816496580927726
|
186
|
-
2 5 0.0 0.0
|
187
|
-
3 4 0.0 0.0
|
188
|
-
3 5 0.0 0.0
|
189
|
-
4 5 0.0 0.0
|
190
|
-
|
191
|
-
Here we can see that the strength of the correlation of change has increased when more data is added with correlated changes at the same positions.
|
192
|
-
|
193
|
-
seqs = Array.new
|
194
|
-
seqs << 'abcde'
|
195
|
-
seqs << 'abcde'
|
196
|
-
seqs << 'kacje' # changed first letter
|
197
|
-
seqs << 'aacae'
|
198
|
-
seqs << 'akcfa' # changed last letter
|
199
|
-
seqs << 'akcfe'
|
200
|
-
|
201
|
-
length_of_every_sequence = seqs[0].size # 5 letters long
|
202
|
-
|
203
|
-
lite_example(seqs, length_of_every_sequence, allowed_letters)
|
204
|
-
|
205
|
-
|
206
|
-
Results:
|
207
|
-
|
208
|
-
i j chi_square contingency_coefficient
|
209
|
-
1 2 2.4 0.534522483824849
|
210
|
-
1 3 0.0 0.0
|
211
|
-
1 4 6.0 0.707106781186548
|
212
|
-
1 5 0.24 0.196116135138184
|
213
|
-
2 3 0.0 0.0
|
214
|
-
2 4 12.0 0.816496580927726
|
215
|
-
2 5 2.4 0.534522483824849
|
216
|
-
3 4 0.0 0.0
|
217
|
-
3 5 0.0 0.0
|
218
|
-
4 5 2.4 0.534522483824849
|
219
|
-
|
220
|
-
With random changes it becomes more difficult to identify correlated changes, yet positions two
|
221
|
-
and four still have the highest correlation as indicated by the contingency coefficient. The
|
222
|
-
best way to improve the accuracy of your results, as is often the case with statistics, is to
|
223
|
-
increase the sample size.
|
224
|
-
|
225
|
-
|
226
|
-
== A Note on Efficiency
|
227
|
-
|
228
|
-
ContingencyTable is slow. It involves many calculations for even a seemingly small five-string data set.
|
229
|
-
Even worse, it's very dependent on matrix traversal, and this is done with two dimensional hashes which
|
230
|
-
dashes any hope of decent speed.
|
231
|
-
|
232
|
-
Finally, half of the matrix is redundant and positions could be summed with their companion position to reduce
|
233
|
-
calculations. For example the positions (5,2) and (2,5) could both have their values added together and
|
234
|
-
just stored in (2,5) while (5,2) could be an illegal position. Also, positions (1,1), (2,2), (3,3), etc.
|
235
|
-
will never be used.
|
236
|
-
|
237
|
-
The purpose of this package is flexibility and education. The code is short and to the point in
|
238
|
-
aims of achieving that purpose. If the BioRuby project moves towards C extensions in the future a
|
239
|
-
professional caliber version will likely be created.
|
240
|
-
|
241
|
-
|
242
|
-
== Author
|
243
|
-
Trevor Wennblom <trevor@corevx.com>
|
244
|
-
|
245
|
-
|
246
|
-
== Copyright
|
247
|
-
Copyright (C) 2005 Trevor Wennblom
|
248
|
-
Licensed under the same terms as BioRuby.
|
249
|
-
|
250
|
-
=end
|
254
|
+
module Bio
|
251
255
|
|
252
256
|
class ContingencyTable
|
253
257
|
# Since we're making this math-notation friendly here is the layout of @table:
|
@@ -334,4 +338,6 @@ class ContingencyTable
|
|
334
338
|
end
|
335
339
|
|
336
340
|
end
|
337
|
-
|
341
|
+
|
342
|
+
end # Bio
|
343
|
+
|