bio 0.7.1 → 1.0.0

Sign up to get free protection for your applications and to get access to all the features.
Files changed (142) hide show
  1. data/bin/bioruby +71 -27
  2. data/bin/br_biofetch.rb +5 -17
  3. data/bin/br_bioflat.rb +14 -26
  4. data/bin/br_biogetseq.rb +6 -18
  5. data/bin/br_pmfetch.rb +6 -16
  6. data/doc/Changes-0.7.rd +35 -0
  7. data/doc/KEGG_API.rd +287 -172
  8. data/doc/KEGG_API.rd.ja +273 -160
  9. data/doc/Tutorial.rd +18 -9
  10. data/doc/Tutorial.rd.ja +656 -138
  11. data/lib/bio.rb +6 -24
  12. data/lib/bio/alignment.rb +5 -5
  13. data/lib/bio/appl/blast.rb +132 -98
  14. data/lib/bio/appl/blast/format0.rb +9 -19
  15. data/lib/bio/appl/blast/wublast.rb +5 -18
  16. data/lib/bio/appl/emboss.rb +40 -47
  17. data/lib/bio/appl/hmmer.rb +116 -82
  18. data/lib/bio/appl/hmmer/report.rb +509 -364
  19. data/lib/bio/appl/spidey/report.rb +7 -18
  20. data/lib/bio/data/na.rb +3 -21
  21. data/lib/bio/db.rb +3 -21
  22. data/lib/bio/db/aaindex.rb +147 -52
  23. data/lib/bio/db/embl/common.rb +27 -6
  24. data/lib/bio/db/embl/embl.rb +18 -10
  25. data/lib/bio/db/embl/sptr.rb +87 -67
  26. data/lib/bio/db/embl/swissprot.rb +32 -3
  27. data/lib/bio/db/embl/trembl.rb +32 -3
  28. data/lib/bio/db/embl/uniprot.rb +32 -3
  29. data/lib/bio/db/fasta.rb +327 -289
  30. data/lib/bio/db/medline.rb +25 -4
  31. data/lib/bio/db/nbrf.rb +12 -20
  32. data/lib/bio/db/pdb.rb +4 -1
  33. data/lib/bio/db/pdb/chemicalcomponent.rb +240 -0
  34. data/lib/bio/db/pdb/pdb.rb +13 -8
  35. data/lib/bio/db/rebase.rb +93 -97
  36. data/lib/bio/feature.rb +2 -31
  37. data/lib/bio/io/ddbjxml.rb +167 -139
  38. data/lib/bio/io/fastacmd.rb +89 -56
  39. data/lib/bio/io/flatfile.rb +994 -278
  40. data/lib/bio/io/flatfile/index.rb +257 -194
  41. data/lib/bio/io/flatfile/indexer.rb +37 -29
  42. data/lib/bio/reference.rb +147 -64
  43. data/lib/bio/sequence.rb +57 -417
  44. data/lib/bio/sequence/aa.rb +64 -0
  45. data/lib/bio/sequence/common.rb +175 -0
  46. data/lib/bio/sequence/compat.rb +68 -0
  47. data/lib/bio/sequence/format.rb +134 -0
  48. data/lib/bio/sequence/generic.rb +24 -0
  49. data/lib/bio/sequence/na.rb +189 -0
  50. data/lib/bio/shell.rb +9 -23
  51. data/lib/bio/shell/core.rb +130 -125
  52. data/lib/bio/shell/demo.rb +143 -0
  53. data/lib/bio/shell/{session.rb → interface.rb} +42 -40
  54. data/lib/bio/shell/object.rb +52 -0
  55. data/lib/bio/shell/plugin/codon.rb +4 -22
  56. data/lib/bio/shell/plugin/emboss.rb +23 -0
  57. data/lib/bio/shell/plugin/entry.rb +34 -25
  58. data/lib/bio/shell/plugin/flatfile.rb +5 -23
  59. data/lib/bio/shell/plugin/keggapi.rb +11 -24
  60. data/lib/bio/shell/plugin/midi.rb +5 -23
  61. data/lib/bio/shell/plugin/obda.rb +4 -22
  62. data/lib/bio/shell/plugin/seq.rb +6 -24
  63. data/lib/bio/shell/rails/Rakefile +10 -0
  64. data/lib/bio/shell/rails/app/controllers/application.rb +4 -0
  65. data/lib/bio/shell/rails/app/controllers/shell_controller.rb +94 -0
  66. data/lib/bio/shell/rails/app/helpers/application_helper.rb +3 -0
  67. data/lib/bio/shell/rails/app/models/shell_connection.rb +30 -0
  68. data/lib/bio/shell/rails/app/views/layouts/shell.rhtml +37 -0
  69. data/lib/bio/shell/rails/app/views/shell/history.rhtml +5 -0
  70. data/lib/bio/shell/rails/app/views/shell/index.rhtml +2 -0
  71. data/lib/bio/shell/rails/app/views/shell/show.rhtml +13 -0
  72. data/lib/bio/shell/rails/config/boot.rb +19 -0
  73. data/lib/bio/shell/rails/config/database.yml +85 -0
  74. data/lib/bio/shell/rails/config/environment.rb +53 -0
  75. data/lib/bio/shell/rails/config/environments/development.rb +19 -0
  76. data/lib/bio/shell/rails/config/environments/production.rb +19 -0
  77. data/lib/bio/shell/rails/config/environments/test.rb +19 -0
  78. data/lib/bio/shell/rails/config/routes.rb +19 -0
  79. data/lib/bio/shell/rails/doc/README_FOR_APP +2 -0
  80. data/lib/bio/shell/rails/public/404.html +8 -0
  81. data/lib/bio/shell/rails/public/500.html +8 -0
  82. data/lib/bio/shell/rails/public/dispatch.cgi +10 -0
  83. data/lib/bio/shell/rails/public/dispatch.fcgi +24 -0
  84. data/lib/bio/shell/rails/public/dispatch.rb +10 -0
  85. data/lib/bio/shell/rails/public/favicon.ico +0 -0
  86. data/lib/bio/shell/rails/public/images/icon.png +0 -0
  87. data/lib/bio/shell/rails/public/images/rails.png +0 -0
  88. data/lib/bio/shell/rails/public/index.html +277 -0
  89. data/lib/bio/shell/rails/public/javascripts/controls.js +750 -0
  90. data/lib/bio/shell/rails/public/javascripts/dragdrop.js +584 -0
  91. data/lib/bio/shell/rails/public/javascripts/effects.js +854 -0
  92. data/lib/bio/shell/rails/public/javascripts/prototype.js +1785 -0
  93. data/lib/bio/shell/rails/public/robots.txt +1 -0
  94. data/lib/bio/shell/rails/public/stylesheets/main.css +187 -0
  95. data/lib/bio/shell/rails/script/about +3 -0
  96. data/lib/bio/shell/rails/script/breakpointer +3 -0
  97. data/lib/bio/shell/rails/script/console +3 -0
  98. data/lib/bio/shell/rails/script/destroy +3 -0
  99. data/lib/bio/shell/rails/script/generate +3 -0
  100. data/lib/bio/shell/rails/script/performance/benchmarker +3 -0
  101. data/lib/bio/shell/rails/script/performance/profiler +3 -0
  102. data/lib/bio/shell/rails/script/plugin +3 -0
  103. data/lib/bio/shell/rails/script/process/reaper +3 -0
  104. data/lib/bio/shell/rails/script/process/spawner +3 -0
  105. data/lib/bio/shell/rails/script/process/spinner +3 -0
  106. data/lib/bio/shell/rails/script/runner +3 -0
  107. data/lib/bio/shell/rails/script/server +42 -0
  108. data/lib/bio/shell/rails/test/test_helper.rb +28 -0
  109. data/lib/bio/shell/web.rb +90 -0
  110. data/lib/bio/util/contingency_table.rb +231 -225
  111. data/sample/any2fasta.rb +59 -0
  112. data/test/data/HMMER/hmmpfam.out +64 -0
  113. data/test/data/HMMER/hmmsearch.out +88 -0
  114. data/test/data/aaindex/DAYM780301 +30 -0
  115. data/test/data/aaindex/PRAM900102 +20 -0
  116. data/test/data/bl2seq/cd8a_cd8b_blastp.bl2seq +53 -0
  117. data/test/data/bl2seq/cd8a_p53_e-5blastp.bl2seq +37 -0
  118. data/test/data/blast/{eco:b0002.faa → b0002.faa} +0 -0
  119. data/test/data/blast/{eco:b0002.faa.m0 → b0002.faa.m0} +2 -2
  120. data/test/data/blast/{eco:b0002.faa.m7 → b0002.faa.m7} +1 -1
  121. data/test/data/blast/{eco:b0002.faa.m8 → b0002.faa.m8} +0 -0
  122. data/test/unit/bio/appl/bl2seq/test_report.rb +134 -0
  123. data/test/unit/bio/appl/blast/test_report.rb +15 -12
  124. data/test/unit/bio/appl/blast/test_xmlparser.rb +4 -4
  125. data/test/unit/bio/appl/hmmer/test_report.rb +355 -0
  126. data/test/unit/bio/appl/test_blast.rb +5 -5
  127. data/test/unit/bio/data/test_na.rb +9 -18
  128. data/test/unit/bio/db/pdb/test_pdb.rb +169 -0
  129. data/test/unit/bio/db/test_aaindex.rb +197 -0
  130. data/test/unit/bio/io/test_fastacmd.rb +55 -0
  131. data/test/unit/bio/sequence/test_aa.rb +102 -0
  132. data/test/unit/bio/sequence/test_common.rb +178 -0
  133. data/test/unit/bio/sequence/test_compat.rb +82 -0
  134. data/test/unit/bio/sequence/test_na.rb +242 -0
  135. data/test/unit/bio/shell/plugin/test_seq.rb +29 -19
  136. data/test/unit/bio/test_alignment.rb +15 -7
  137. data/test/unit/bio/test_reference.rb +198 -0
  138. data/test/unit/bio/test_sequence.rb +4 -49
  139. data/test/unit/bio/test_shell.rb +2 -2
  140. metadata +118 -15
  141. data/lib/bio/io/brdb.rb +0 -103
  142. data/lib/bioruby.rb +0 -34
@@ -0,0 +1 @@
1
+ # See http://www.robotstxt.org/wc/norobots.html for documentation on how to use the robots.txt file
@@ -0,0 +1,187 @@
1
+ body { background-color: #fff; color: #333; }
2
+
3
+ body, p, td {
4
+ font-family: verdana, arial, helvetica, sans-serif;
5
+ font-size: 13px;
6
+ line-height: 18px;
7
+ }
8
+
9
+ pre {
10
+ background-color: #eee;
11
+ padding: 10px;
12
+ font-size: 11px;
13
+ }
14
+
15
+ a { color: #000; }
16
+ a:visited { color: #666; }
17
+ a:hover { color: #fff; background-color:#000; }
18
+
19
+ .fieldWithErrors {
20
+ padding: 2px;
21
+ background-color: red;
22
+ display: table;
23
+ }
24
+
25
+ table {
26
+ text-align: top;
27
+ }
28
+
29
+ #ErrorExplanation {
30
+ width: 400px;
31
+ border: 2px solid red;
32
+ padding: 7px;
33
+ padding-bottom: 12px;
34
+ margin-bottom: 20px;
35
+ background-color: #f0f0f0;
36
+ }
37
+
38
+ #ErrorExplanation h2 {
39
+ text-align: left;
40
+ font-weight: bold;
41
+ padding: 5px 5px 5px 15px;
42
+ font-size: 12px;
43
+ margin: -7px;
44
+ background-color: #c00;
45
+ color: #fff;
46
+ }
47
+
48
+ #ErrorExplanation p {
49
+ color: #333;
50
+ margin-bottom: 0;
51
+ padding: 5px;
52
+ }
53
+
54
+ #ErrorExplanation ul li {
55
+ font-size: 12px;
56
+ list-style: square;
57
+ }
58
+
59
+ h1{
60
+ color: #333;
61
+ padding: 10px;
62
+ margin: 12px;
63
+ }
64
+
65
+ #banner{
66
+ margin-left: 5em;
67
+ margin-right: -6px;
68
+ text-align: center;
69
+ font-size: 30px;
70
+ border-top: 1px solid silver;
71
+ border-bottom: 1px solid silver;
72
+ padding: 10px 0px 10px 0px;
73
+ }
74
+
75
+ tr{
76
+ vertical-align: top;
77
+ text-align: left;
78
+ }
79
+
80
+ #side img{
81
+ background-color: black;
82
+ width:12em;
83
+ }
84
+
85
+ #side{
86
+ position: absolute;
87
+ margin: -13px;
88
+ top: 1em;
89
+ left: 1em;
90
+ width: 10em;
91
+ }
92
+
93
+ #side ul{
94
+ font-size: 13px;
95
+ margin: 0em;
96
+ }
97
+
98
+ #side h2{
99
+ font-size: 12px;
100
+ text-align: center;
101
+ width: 13em;
102
+ background-color: black;
103
+ color: white;
104
+ }
105
+ #side input{
106
+ text-align: center;
107
+ }
108
+
109
+ #main {
110
+ margin-left: 10em;
111
+ padding-top: 4x;
112
+ padding-left: 2em;
113
+ background: white;
114
+ }
115
+
116
+ .main {
117
+ margin-left: 10em;
118
+ padding-top: 4x;
119
+ padding-left: 2em;
120
+ background: white;
121
+ }
122
+
123
+ #menu {
124
+ margin-left: 10em;
125
+ padding-top: 4x;
126
+ padding-left: 2em;
127
+ background: white;
128
+ }
129
+
130
+ div.uploadStatus {
131
+ margin: 5px;
132
+ }
133
+
134
+ div.progressBar {
135
+ margin: 5px;
136
+ }
137
+
138
+ div.progressBar div.border {
139
+ background-color: #fff;
140
+ border: 1px solid grey;
141
+ width: 100%;
142
+ }
143
+
144
+ div.progressBar div.background {
145
+ background-color: #333;
146
+ height: 18px;
147
+ width: 0%;
148
+ }
149
+
150
+ .tabs {
151
+ position:relative;
152
+ height: 20px;
153
+ margin: 0;
154
+ padding: 0;
155
+ background: #aaa repeat-x;
156
+ overflow:hidden
157
+ }
158
+
159
+ .tabs li {
160
+ display:inline;
161
+ }
162
+
163
+ .tabs a:hover, .tabs a.tab-active {
164
+ color:#333;
165
+ background:#fff url("bar_on.gif") repeat-x;
166
+ border-right: 1px solid #fff
167
+ }
168
+
169
+
170
+ .tabs a {
171
+ height: 27px;
172
+ font:12px verdana, helvetica, sans-serif;
173
+ font-weight:bold;
174
+ position:relative;
175
+ padding:6px 10px 10px 10px;
176
+ margin: 0px -4px 0px 0px;
177
+ color:#333;
178
+ text-decoration:none;
179
+ border-left:1px solid #fff;
180
+ border-right:1px solid #333;
181
+ }
182
+ .tab-container {
183
+ background: #fff;
184
+ border:1px solid #555;
185
+ }
186
+ .tab-panes {
187
+ margin: 3px }
@@ -0,0 +1,3 @@
1
+ #!/usr/bin/env ruby
2
+ require File.dirname(__FILE__) + '/../config/boot'
3
+ require 'commands/about'
@@ -0,0 +1,3 @@
1
+ #!/usr/bin/env ruby
2
+ require File.dirname(__FILE__) + '/../config/boot'
3
+ require 'commands/breakpointer'
@@ -0,0 +1,3 @@
1
+ #!/usr/bin/env ruby
2
+ require File.dirname(__FILE__) + '/../config/boot'
3
+ require 'commands/console'
@@ -0,0 +1,3 @@
1
+ #!/usr/bin/env ruby
2
+ require File.dirname(__FILE__) + '/../config/boot'
3
+ require 'commands/destroy'
@@ -0,0 +1,3 @@
1
+ #!/usr/bin/env ruby
2
+ require File.dirname(__FILE__) + '/../config/boot'
3
+ require 'commands/generate'
@@ -0,0 +1,3 @@
1
+ #!/usr/bin/env ruby
2
+ require File.dirname(__FILE__) + '/../../config/boot'
3
+ require 'commands/performance/benchmarker'
@@ -0,0 +1,3 @@
1
+ #!/usr/bin/env ruby
2
+ require File.dirname(__FILE__) + '/../../config/boot'
3
+ require 'commands/performance/profiler'
@@ -0,0 +1,3 @@
1
+ #!/usr/bin/env ruby
2
+ require File.dirname(__FILE__) + '/../config/boot'
3
+ require 'commands/plugin'
@@ -0,0 +1,3 @@
1
+ #!/usr/bin/env ruby
2
+ require File.dirname(__FILE__) + '/../../config/boot'
3
+ require 'commands/process/reaper'
@@ -0,0 +1,3 @@
1
+ #!/usr/bin/env ruby
2
+ require File.dirname(__FILE__) + '/../../config/boot'
3
+ require 'commands/process/spawner'
@@ -0,0 +1,3 @@
1
+ #!/usr/bin/env ruby
2
+ require File.dirname(__FILE__) + '/../../config/boot'
3
+ require 'commands/process/spinner'
@@ -0,0 +1,3 @@
1
+ #!/usr/bin/env ruby
2
+ require File.dirname(__FILE__) + '/../config/boot'
3
+ require 'commands/runner'
@@ -0,0 +1,42 @@
1
+ #!/usr/bin/env ruby
2
+ #
3
+ # = BioRuby shell on Rails server - GUI for the BioRuby shell
4
+ #
5
+ # Copyright:: Copyright (C) 2006
6
+ # Nobuya Tanaka <t@chemruby.org>,
7
+ # Toshiaki Katayama <k@bioruby.org>
8
+ # License:: Ruby's
9
+ #
10
+ # $Id: server,v 1.1 2006/02/27 11:16:23 k Exp $
11
+ #
12
+
13
+ require 'bio/shell'
14
+ require 'drb/drb'
15
+
16
+ require './app/models/shell_connection'
17
+ $drb_server = ShellConnection.new
18
+
19
+ ## Access Control List
20
+ #
21
+ # require 'drb/acl'
22
+ #
23
+ # list = %w(deny all
24
+ # allow 127.0.0.1
25
+ # )
26
+ # acl = ACL.new(list, ACL::DENY_ALLOW)
27
+ # DRb.install_acl(acl)
28
+ #
29
+
30
+ STDOUT.sync = true
31
+
32
+ #uri = "druby://localhost:0"
33
+ uri = 'druby://localhost:81064' # baioroji-
34
+ DRb.start_service(uri, $drb_server)
35
+ puts DRb.uri
36
+
37
+ puts "starting ..."
38
+
39
+ require './config/boot'
40
+ require 'commands/server'
41
+
42
+ puts "exiting ..."
@@ -0,0 +1,28 @@
1
+ ENV["RAILS_ENV"] = "test"
2
+ require File.expand_path(File.dirname(__FILE__) + "/../config/environment")
3
+ require 'test_help'
4
+
5
+ class Test::Unit::TestCase
6
+ # Transactional fixtures accelerate your tests by wrapping each test method
7
+ # in a transaction that's rolled back on completion. This ensures that the
8
+ # test database remains unchanged so your fixtures don't have to be reloaded
9
+ # between every test method. Fewer database queries means faster tests.
10
+ #
11
+ # Read Mike Clark's excellent walkthrough at
12
+ # http://clarkware.com/cgi/blosxom/2005/10/24#Rails10FastTesting
13
+ #
14
+ # Every Active Record database supports transactions except MyISAM tables
15
+ # in MySQL. Turn off transactional fixtures in this case; however, if you
16
+ # don't care one way or the other, switching from MyISAM to InnoDB tables
17
+ # is recommended.
18
+ self.use_transactional_fixtures = true
19
+
20
+ # Instantiated fixtures are slow, but give you @david where otherwise you
21
+ # would need people(:david). If you don't want to migrate your existing
22
+ # test cases which use the @david style and don't mind the speed hit (each
23
+ # instantiated fixtures translates to a database query per test method),
24
+ # then set this back to true.
25
+ self.use_instantiated_fixtures = false
26
+
27
+ # Add more helper methods to be used by all tests here...
28
+ end
@@ -0,0 +1,90 @@
1
+ #
2
+ # = bio/shell/web.rb - GUI for the BioRuby shell
3
+ #
4
+ # Copyright:: Copyright (C) 2006
5
+ # Nobuya Tanaka <t@chemruby.org>,
6
+ # Toshiaki Katayama <k@bioruby.org>
7
+ # License:: Ruby's
8
+ #
9
+ # $Id: web.rb,v 1.1 2006/02/27 09:22:42 k Exp $
10
+ #
11
+
12
+
13
+ module Bio::Shell
14
+
15
+ private
16
+
17
+ def rails_directory_setup
18
+ server = "script/server"
19
+ unless File.exists?(server)
20
+ require 'fileutils'
21
+ basedir = File.dirname(__FILE__)
22
+ print "Copying web server files ... "
23
+ FileUtils.cp_r("#{basedir}/rails/.", ".")
24
+ puts "done"
25
+ end
26
+ end
27
+
28
+ def rails_server_setup
29
+ require 'open3'
30
+ $web_server = Open3.popen3(server)
31
+
32
+ $web_error_log = File.open("log/web-error.log", "a")
33
+ $web_server[2].reopen($web_error_log)
34
+
35
+ while line = $web_server[1].gets
36
+ if line[/druby:\/\/localhost/]
37
+ uri = line.chomp
38
+ puts uri if $DEBUG
39
+ break
40
+ end
41
+ end
42
+
43
+ $web_access_log = File.open("log/web-access.log", "a")
44
+ $web_server[1].reopen($web_access_log)
45
+
46
+ return uri
47
+ end
48
+
49
+ def web
50
+ return if $web_server
51
+
52
+ require 'drb/drb'
53
+ # $SAFE = 1 # disable eval() and friends
54
+
55
+ rails_directory_setup
56
+ #uri = rails_server_setup
57
+ uri = 'druby://localhost:81064' # baioroji-
58
+
59
+ $drb_server = DRbObject.new_with_uri(uri)
60
+ $drb_server.puts_remote("Connected")
61
+
62
+ puts "Connected to server #{uri}"
63
+ puts "Open http://localhost:3000/shell/"
64
+
65
+ io = IRB.conf[:MAIN_CONTEXT].io
66
+
67
+ io.class.class_eval do
68
+ alias_method :shell_original_gets, :gets
69
+ end
70
+
71
+ def io.gets
72
+ bind = IRB.conf[:MAIN_CONTEXT].workspace.binding
73
+ vars = eval("local_variables", bind)
74
+ vars.each do |var|
75
+ next if var == "_"
76
+ if val = eval("#{var}", bind)
77
+ $drb_server[var] = val
78
+ else
79
+ $drb_server.delete(var)
80
+ end
81
+ end
82
+ line = shell_original_gets
83
+ line
84
+ end
85
+ end
86
+
87
+ end
88
+
89
+
90
+
@@ -1,14 +1,236 @@
1
- module Bio
2
-
3
1
  #
4
- # bio/util/contingency_table.rb - Statistical contingency table analysis for aligned sequences
2
+ # = bio/util/contingency_table.rb - Statistical contingency table analysis for aligned sequences
5
3
  #
6
4
  # Copyright:: Copyright (C) 2005 Trevor Wennblom <trevor@corevx.com>
7
5
  # License:: LGPL
8
6
  #
9
- # $Id: contingency_table.rb,v 1.2 2005/12/13 14:58:37 trevor Exp $
10
- #
7
+ # $Id: contingency_table.rb,v 1.4 2006/02/27 13:23:01 k Exp $
11
8
  #
9
+ # == Synopsis
10
+ #
11
+ # The Bio::ContingencyTable class provides basic statistical contingency table
12
+ # analysis for two positions within aligned sequences.
13
+ #
14
+ # When ContingencyTable is instantiated the set of characters in the
15
+ # aligned sequences may be passed to it as an array. This is
16
+ # important since it uses these characters to create the table's rows
17
+ # and columns. If this array is not passed it will use it's default
18
+ # of an amino acid and nucleotide alphabet in lowercase along with the
19
+ # clustal spacer '-'.
20
+ #
21
+ # To get data from the table the most used functions will be
22
+ # chi_square and contingency_coefficient:
23
+ #
24
+ # ctable = Bio::ContingencyTable.new()
25
+ # ctable['a']['t'] += 1
26
+ # # .. put more values into the table
27
+ # puts ctable.chi_square
28
+ # puts ctable.contingency_coefficient # between 0.0 and 1.0
29
+ #
30
+ # The contingency_coefficient represents the degree of correlation of
31
+ # change between two sequence positions in a multiple-sequence
32
+ # alignment. 0.0 indicates no correlation, 1.0 is the maximum
33
+ # correlation.
34
+ #
35
+ #
36
+ # == Further Reading
37
+ #
38
+ # * http://en.wikipedia.org/wiki/Contingency_table
39
+ # * http://www.physics.csbsju.edu/stats/exact.details.html
40
+ # * Numerical Recipes in C by Press, Flannery, Teukolsky, and Vetterling
41
+ # #
42
+ # == Usage
43
+ #
44
+ # What follows is an example of ContingencyTable in typical usage
45
+ # analyzing results from a clustal alignment.
46
+ #
47
+ # require 'bio'
48
+ # require 'bio/contingency_table'
49
+ #
50
+ # seqs = {}
51
+ # max_length = 0
52
+ # Bio::ClustalW::Report.new( IO.read('sample.aln') ).to_a.each do |entry|
53
+ # data = entry.data.strip
54
+ # seqs[entry.definition] = data.downcase
55
+ # max_length = data.size if max_length == 0
56
+ # raise "Aligned sequences must be the same length!" unless data.size == max_length
57
+ # end
58
+ #
59
+ # VERBOSE = true
60
+ # puts "i\tj\tchi_square\tcontingency_coefficient" if VERBOSE
61
+ # correlations = {}
62
+ #
63
+ # 0.upto(max_length - 1) do |i|
64
+ # (i+1).upto(max_length - 1) do |j|
65
+ # ctable = Bio::ContingencyTable.new()
66
+ # seqs.each_value { |seq| ctable.table[ seq[i].chr ][ seq[j].chr ] += 1 }
67
+ #
68
+ # chi_square = ctable.chi_square
69
+ # contingency_coefficient = ctable.contingency_coefficient
70
+ # puts [(i+1), (j+1), chi_square, contingency_coefficient].join("\t") if VERBOSE
71
+ #
72
+ # correlations["#{i+1},#{j+1}"] = contingency_coefficient
73
+ # correlations["#{j+1},#{i+1}"] = contingency_coefficient # Both ways are accurate
74
+ # end
75
+ # end
76
+ #
77
+ # require 'yaml'
78
+ # File.new('results.yml', 'a+') { |f| f.puts correlations.to_yaml }
79
+ #
80
+ #
81
+ # == Tutorial
82
+ #
83
+
84
+ # ContingencyTable returns the statistical significance of change
85
+ # between two positions in an alignment. If you would like to see how
86
+ # every possible combination of positions in your alignment compares
87
+ # to one another you must set this up yourself. Hopefully the
88
+ # provided examples will help you get started without too much
89
+ # trouble.
90
+ #
91
+ # def lite_example(sequences, max_length, characters)
92
+ #
93
+ # %w{i j chi_square contingency_coefficient}.each { |x| print x.ljust(12) }
94
+ # puts
95
+ #
96
+ # 0.upto(max_length - 1) do |i|
97
+ # (i+1).upto(max_length - 1) do |j|
98
+ # ctable = Bio::ContingencyTable.new( characters )
99
+ # sequences.each do |seq|
100
+ # i_char = seq[i].chr
101
+ # j_char = seq[j].chr
102
+ # ctable.table[i_char][j_char] += 1
103
+ # end
104
+ # chi_square = ctable.chi_square
105
+ # contingency_coefficient = ctable.contingency_coefficient
106
+ # [(i+1), (j+1), chi_square, contingency_coefficient].each { |x| print x.to_s.ljust(12) }
107
+ # puts
108
+ # end
109
+ # end
110
+ #
111
+ # end
112
+ #
113
+ # allowed_letters = Array.new
114
+ # allowed_letters = 'abcdefghijk'.split('')
115
+ #
116
+ # seqs = Array.new
117
+ # seqs << 'abcde'
118
+ # seqs << 'abcde'
119
+ # seqs << 'aacje'
120
+ # seqs << 'aacae'
121
+ #
122
+ # length_of_every_sequence = seqs[0].size # 5 letters long
123
+ #
124
+ # lite_example(seqs, length_of_every_sequence, allowed_letters)
125
+ #
126
+ #
127
+ # Producing the following results:
128
+ #
129
+ # i j chi_square contingency_coefficient
130
+ # 1 2 0.0 0.0
131
+ # 1 3 0.0 0.0
132
+ # 1 4 0.0 0.0
133
+ # 1 5 0.0 0.0
134
+ # 2 3 0.0 0.0
135
+ # 2 4 4.0 0.707106781186548
136
+ # 2 5 0.0 0.0
137
+ # 3 4 0.0 0.0
138
+ # 3 5 0.0 0.0
139
+ # 4 5 0.0 0.0
140
+ #
141
+ # The position i=2 and j=4 has a high contingency coefficient
142
+ # indicating that the changes at these positions are related. Note
143
+ # that i and j are arbitrary, this could be represented as i=4 and j=2
144
+ # since they both refer to position two and position four in the
145
+ # alignment. Here are some more examples:
146
+ #
147
+ # seqs = Array.new
148
+ # seqs << 'abcde'
149
+ # seqs << 'abcde'
150
+ # seqs << 'aacje'
151
+ # seqs << 'aacae'
152
+ # seqs << 'akcfe'
153
+ # seqs << 'akcfe'
154
+ #
155
+ # length_of_every_sequence = seqs[0].size # 5 letters long
156
+ #
157
+ # lite_example(seqs, length_of_every_sequence, allowed_letters)
158
+ #
159
+ #
160
+ # Results:
161
+ #
162
+ # i j chi_square contingency_coefficient
163
+ # 1 2 0.0 0.0
164
+ # 1 3 0.0 0.0
165
+ # 1 4 0.0 0.0
166
+ # 1 5 0.0 0.0
167
+ # 2 3 0.0 0.0
168
+ # 2 4 12.0 0.816496580927726
169
+ # 2 5 0.0 0.0
170
+ # 3 4 0.0 0.0
171
+ # 3 5 0.0 0.0
172
+ # 4 5 0.0 0.0
173
+ #
174
+ # Here we can see that the strength of the correlation of change has
175
+ # increased when more data is added with correlated changes at the
176
+ # same positions.
177
+ #
178
+ # seqs = Array.new
179
+ # seqs << 'abcde'
180
+ # seqs << 'abcde'
181
+ # seqs << 'kacje' # changed first letter
182
+ # seqs << 'aacae'
183
+ # seqs << 'akcfa' # changed last letter
184
+ # seqs << 'akcfe'
185
+ #
186
+ # length_of_every_sequence = seqs[0].size # 5 letters long
187
+ #
188
+ # lite_example(seqs, length_of_every_sequence, allowed_letters)
189
+ #
190
+ #
191
+ # Results:
192
+ #
193
+ # i j chi_square contingency_coefficient
194
+ # 1 2 2.4 0.534522483824849
195
+ # 1 3 0.0 0.0
196
+ # 1 4 6.0 0.707106781186548
197
+ # 1 5 0.24 0.196116135138184
198
+ # 2 3 0.0 0.0
199
+ # 2 4 12.0 0.816496580927726
200
+ # 2 5 2.4 0.534522483824849
201
+ # 3 4 0.0 0.0
202
+ # 3 5 0.0 0.0
203
+ # 4 5 2.4 0.534522483824849
204
+ #
205
+ # With random changes it becomes more difficult to identify correlated
206
+ # changes, yet positions two and four still have the highest
207
+ # correlation as indicated by the contingency coefficient. The best
208
+ # way to improve the accuracy of your results, as is often the case
209
+ # with statistics, is to increase the sample size.
210
+ #
211
+ #
212
+ # == A Note on Efficiency
213
+ #
214
+
215
+ # ContingencyTable is slow. It involves many calculations for even a
216
+ # seemingly small five-string data set. Even worse, it's very
217
+ # dependent on matrix traversal, and this is done with two dimensional
218
+ # hashes which dashes any hope of decent speed.
219
+ #
220
+
221
+ # Finally, half of the matrix is redundant and positions could be
222
+ # summed with their companion position to reduce calculations. For
223
+ # example the positions (5,2) and (2,5) could both have their values
224
+ # added together and just stored in (2,5) while (5,2) could be an
225
+ # illegal position. Also, positions (1,1), (2,2), (3,3), etc. will
226
+ # never be used.
227
+ #
228
+ # The purpose of this package is flexibility and education. The code
229
+ # is short and to the point in aims of achieving that purpose. If the
230
+ # BioRuby project moves towards C extensions in the future a
231
+ # professional caliber version will likely be created.
232
+ #
233
+ #
12
234
  #--
13
235
  #
14
236
  # This library is free software; you can redistribute it and/or
@@ -29,225 +251,7 @@ module Bio
29
251
  #
30
252
  #
31
253
 
32
- =begin rdoc
33
- bio/util/contingency_table.rb - Statistical contingency table analysis for aligned sequences
34
-
35
- == Synopsis
36
-
37
- The Bio::ContingencyTable class provides basic statistical contingency table
38
- analysis for two positions within aligned sequences.
39
-
40
- When ContingencyTable is instantiated the set of characters in the aligned sequences may be
41
- passed to it as an array. This is important since it uses these characters
42
- to create the table's rows and columns. If this array is not passed it will
43
- use it's default of an amino acid and nucleotide alphabet in lowercase along with the
44
- clustal spacer '-'.
45
-
46
- To get data from the table the most used functions will be chi_square and contingency_coefficient:
47
- ctable = Bio::ContingencyTable.new()
48
- ctable['a']['t'] += 1
49
- # .. put more values into the table
50
- puts ctable.chi_square
51
- puts ctable.contingency_coefficient # between 0.0 and 1.0
52
-
53
- The contingency_coefficient represents the degree of correlation of change between two
54
- sequence positions in a multiple-sequence alignment. 0.0 indicates no correlation, 1.0 is the
55
- maximum correlation.
56
-
57
-
58
- == Further Reading
59
-
60
- * http://en.wikipedia.org/wiki/Contingency_table
61
- * http://www.physics.csbsju.edu/stats/exact.details.html
62
- * Numerical Recipes in C by Press, Flannery, Teukolsky, and Vetterling
63
-
64
-
65
- == Usage
66
-
67
- What follows is an example of ContingencyTable in typical usage analyzing results from a clustal alignment.
68
-
69
- require 'bio'
70
- require 'bio/contingency_table'
71
-
72
- seqs = {}
73
- max_length = 0
74
- Bio::ClustalW::Report.new( IO.read('sample.aln') ).to_a.each do |entry|
75
- data = entry.data.strip
76
- seqs[entry.definition] = data.downcase
77
- max_length = data.size if max_length == 0
78
- raise "Aligned sequences must be the same length!" unless data.size == max_length
79
- end
80
-
81
- VERBOSE = true
82
- puts "i\tj\tchi_square\tcontingency_coefficient" if VERBOSE
83
- correlations = {}
84
-
85
- 0.upto(max_length - 1) do |i|
86
- (i+1).upto(max_length - 1) do |j|
87
- ctable = Bio::ContingencyTable.new()
88
- seqs.each_value { |seq| ctable.table[ seq[i].chr ][ seq[j].chr ] += 1 }
89
-
90
- chi_square = ctable.chi_square
91
- contingency_coefficient = ctable.contingency_coefficient
92
- puts [(i+1), (j+1), chi_square, contingency_coefficient].join("\t") if VERBOSE
93
-
94
- correlations["#{i+1},#{j+1}"] = contingency_coefficient
95
- correlations["#{j+1},#{i+1}"] = contingency_coefficient # Both ways are accurate
96
- end
97
- end
98
-
99
- require 'yaml'
100
- File.new('results.yml', 'a+') { |f| f.puts correlations.to_yaml }
101
-
102
-
103
- == Tutorial
104
-
105
- ContingencyTable returns the statistical significance of change between two positions in an alignment.
106
- If you would like to see how every possible combination of positions in your alignment compares to one another
107
- you must set this up yourself. Hopefully the provided examples will help you get started without
108
- too much trouble.
109
-
110
- def lite_example(sequences, max_length, characters)
111
-
112
- %w{i j chi_square contingency_coefficient}.each { |x| print x.ljust(12) }
113
- puts
114
-
115
- 0.upto(max_length - 1) do |i|
116
- (i+1).upto(max_length - 1) do |j|
117
- ctable = Bio::ContingencyTable.new( characters )
118
- sequences.each do |seq|
119
- i_char = seq[i].chr
120
- j_char = seq[j].chr
121
- ctable.table[i_char][j_char] += 1
122
- end
123
- chi_square = ctable.chi_square
124
- contingency_coefficient = ctable.contingency_coefficient
125
- [(i+1), (j+1), chi_square, contingency_coefficient].each { |x| print x.to_s.ljust(12) }
126
- puts
127
- end
128
- end
129
-
130
- end
131
-
132
- allowed_letters = Array.new
133
- allowed_letters = 'abcdefghijk'.split('')
134
-
135
- seqs = Array.new
136
- seqs << 'abcde'
137
- seqs << 'abcde'
138
- seqs << 'aacje'
139
- seqs << 'aacae'
140
-
141
- length_of_every_sequence = seqs[0].size # 5 letters long
142
-
143
- lite_example(seqs, length_of_every_sequence, allowed_letters)
144
-
145
-
146
- Producing the following results:
147
-
148
- i j chi_square contingency_coefficient
149
- 1 2 0.0 0.0
150
- 1 3 0.0 0.0
151
- 1 4 0.0 0.0
152
- 1 5 0.0 0.0
153
- 2 3 0.0 0.0
154
- 2 4 4.0 0.707106781186548
155
- 2 5 0.0 0.0
156
- 3 4 0.0 0.0
157
- 3 5 0.0 0.0
158
- 4 5 0.0 0.0
159
-
160
- The position i=2 and j=4 has a high contingency coefficient indicating that the changes at these
161
- positions are related. Note that i and j are arbitrary, this could be represented as i=4 and j=2
162
- since they both refer to position two and position four in the alignment. Here are some more examples:
163
-
164
- seqs = Array.new
165
- seqs << 'abcde'
166
- seqs << 'abcde'
167
- seqs << 'aacje'
168
- seqs << 'aacae'
169
- seqs << 'akcfe'
170
- seqs << 'akcfe'
171
-
172
- length_of_every_sequence = seqs[0].size # 5 letters long
173
-
174
- lite_example(seqs, length_of_every_sequence, allowed_letters)
175
-
176
-
177
- Results:
178
-
179
- i j chi_square contingency_coefficient
180
- 1 2 0.0 0.0
181
- 1 3 0.0 0.0
182
- 1 4 0.0 0.0
183
- 1 5 0.0 0.0
184
- 2 3 0.0 0.0
185
- 2 4 12.0 0.816496580927726
186
- 2 5 0.0 0.0
187
- 3 4 0.0 0.0
188
- 3 5 0.0 0.0
189
- 4 5 0.0 0.0
190
-
191
- Here we can see that the strength of the correlation of change has increased when more data is added with correlated changes at the same positions.
192
-
193
- seqs = Array.new
194
- seqs << 'abcde'
195
- seqs << 'abcde'
196
- seqs << 'kacje' # changed first letter
197
- seqs << 'aacae'
198
- seqs << 'akcfa' # changed last letter
199
- seqs << 'akcfe'
200
-
201
- length_of_every_sequence = seqs[0].size # 5 letters long
202
-
203
- lite_example(seqs, length_of_every_sequence, allowed_letters)
204
-
205
-
206
- Results:
207
-
208
- i j chi_square contingency_coefficient
209
- 1 2 2.4 0.534522483824849
210
- 1 3 0.0 0.0
211
- 1 4 6.0 0.707106781186548
212
- 1 5 0.24 0.196116135138184
213
- 2 3 0.0 0.0
214
- 2 4 12.0 0.816496580927726
215
- 2 5 2.4 0.534522483824849
216
- 3 4 0.0 0.0
217
- 3 5 0.0 0.0
218
- 4 5 2.4 0.534522483824849
219
-
220
- With random changes it becomes more difficult to identify correlated changes, yet positions two
221
- and four still have the highest correlation as indicated by the contingency coefficient. The
222
- best way to improve the accuracy of your results, as is often the case with statistics, is to
223
- increase the sample size.
224
-
225
-
226
- == A Note on Efficiency
227
-
228
- ContingencyTable is slow. It involves many calculations for even a seemingly small five-string data set.
229
- Even worse, it's very dependent on matrix traversal, and this is done with two dimensional hashes which
230
- dashes any hope of decent speed.
231
-
232
- Finally, half of the matrix is redundant and positions could be summed with their companion position to reduce
233
- calculations. For example the positions (5,2) and (2,5) could both have their values added together and
234
- just stored in (2,5) while (5,2) could be an illegal position. Also, positions (1,1), (2,2), (3,3), etc.
235
- will never be used.
236
-
237
- The purpose of this package is flexibility and education. The code is short and to the point in
238
- aims of achieving that purpose. If the BioRuby project moves towards C extensions in the future a
239
- professional caliber version will likely be created.
240
-
241
-
242
- == Author
243
- Trevor Wennblom <trevor@corevx.com>
244
-
245
-
246
- == Copyright
247
- Copyright (C) 2005 Trevor Wennblom
248
- Licensed under the same terms as BioRuby.
249
-
250
- =end
254
+ module Bio
251
255
 
252
256
  class ContingencyTable
253
257
  # Since we're making this math-notation friendly here is the layout of @table:
@@ -334,4 +338,6 @@ class ContingencyTable
334
338
  end
335
339
 
336
340
  end
337
- end
341
+
342
+ end # Bio
343
+