bio 0.7.1 → 1.0.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (142) hide show
  1. data/bin/bioruby +71 -27
  2. data/bin/br_biofetch.rb +5 -17
  3. data/bin/br_bioflat.rb +14 -26
  4. data/bin/br_biogetseq.rb +6 -18
  5. data/bin/br_pmfetch.rb +6 -16
  6. data/doc/Changes-0.7.rd +35 -0
  7. data/doc/KEGG_API.rd +287 -172
  8. data/doc/KEGG_API.rd.ja +273 -160
  9. data/doc/Tutorial.rd +18 -9
  10. data/doc/Tutorial.rd.ja +656 -138
  11. data/lib/bio.rb +6 -24
  12. data/lib/bio/alignment.rb +5 -5
  13. data/lib/bio/appl/blast.rb +132 -98
  14. data/lib/bio/appl/blast/format0.rb +9 -19
  15. data/lib/bio/appl/blast/wublast.rb +5 -18
  16. data/lib/bio/appl/emboss.rb +40 -47
  17. data/lib/bio/appl/hmmer.rb +116 -82
  18. data/lib/bio/appl/hmmer/report.rb +509 -364
  19. data/lib/bio/appl/spidey/report.rb +7 -18
  20. data/lib/bio/data/na.rb +3 -21
  21. data/lib/bio/db.rb +3 -21
  22. data/lib/bio/db/aaindex.rb +147 -52
  23. data/lib/bio/db/embl/common.rb +27 -6
  24. data/lib/bio/db/embl/embl.rb +18 -10
  25. data/lib/bio/db/embl/sptr.rb +87 -67
  26. data/lib/bio/db/embl/swissprot.rb +32 -3
  27. data/lib/bio/db/embl/trembl.rb +32 -3
  28. data/lib/bio/db/embl/uniprot.rb +32 -3
  29. data/lib/bio/db/fasta.rb +327 -289
  30. data/lib/bio/db/medline.rb +25 -4
  31. data/lib/bio/db/nbrf.rb +12 -20
  32. data/lib/bio/db/pdb.rb +4 -1
  33. data/lib/bio/db/pdb/chemicalcomponent.rb +240 -0
  34. data/lib/bio/db/pdb/pdb.rb +13 -8
  35. data/lib/bio/db/rebase.rb +93 -97
  36. data/lib/bio/feature.rb +2 -31
  37. data/lib/bio/io/ddbjxml.rb +167 -139
  38. data/lib/bio/io/fastacmd.rb +89 -56
  39. data/lib/bio/io/flatfile.rb +994 -278
  40. data/lib/bio/io/flatfile/index.rb +257 -194
  41. data/lib/bio/io/flatfile/indexer.rb +37 -29
  42. data/lib/bio/reference.rb +147 -64
  43. data/lib/bio/sequence.rb +57 -417
  44. data/lib/bio/sequence/aa.rb +64 -0
  45. data/lib/bio/sequence/common.rb +175 -0
  46. data/lib/bio/sequence/compat.rb +68 -0
  47. data/lib/bio/sequence/format.rb +134 -0
  48. data/lib/bio/sequence/generic.rb +24 -0
  49. data/lib/bio/sequence/na.rb +189 -0
  50. data/lib/bio/shell.rb +9 -23
  51. data/lib/bio/shell/core.rb +130 -125
  52. data/lib/bio/shell/demo.rb +143 -0
  53. data/lib/bio/shell/{session.rb → interface.rb} +42 -40
  54. data/lib/bio/shell/object.rb +52 -0
  55. data/lib/bio/shell/plugin/codon.rb +4 -22
  56. data/lib/bio/shell/plugin/emboss.rb +23 -0
  57. data/lib/bio/shell/plugin/entry.rb +34 -25
  58. data/lib/bio/shell/plugin/flatfile.rb +5 -23
  59. data/lib/bio/shell/plugin/keggapi.rb +11 -24
  60. data/lib/bio/shell/plugin/midi.rb +5 -23
  61. data/lib/bio/shell/plugin/obda.rb +4 -22
  62. data/lib/bio/shell/plugin/seq.rb +6 -24
  63. data/lib/bio/shell/rails/Rakefile +10 -0
  64. data/lib/bio/shell/rails/app/controllers/application.rb +4 -0
  65. data/lib/bio/shell/rails/app/controllers/shell_controller.rb +94 -0
  66. data/lib/bio/shell/rails/app/helpers/application_helper.rb +3 -0
  67. data/lib/bio/shell/rails/app/models/shell_connection.rb +30 -0
  68. data/lib/bio/shell/rails/app/views/layouts/shell.rhtml +37 -0
  69. data/lib/bio/shell/rails/app/views/shell/history.rhtml +5 -0
  70. data/lib/bio/shell/rails/app/views/shell/index.rhtml +2 -0
  71. data/lib/bio/shell/rails/app/views/shell/show.rhtml +13 -0
  72. data/lib/bio/shell/rails/config/boot.rb +19 -0
  73. data/lib/bio/shell/rails/config/database.yml +85 -0
  74. data/lib/bio/shell/rails/config/environment.rb +53 -0
  75. data/lib/bio/shell/rails/config/environments/development.rb +19 -0
  76. data/lib/bio/shell/rails/config/environments/production.rb +19 -0
  77. data/lib/bio/shell/rails/config/environments/test.rb +19 -0
  78. data/lib/bio/shell/rails/config/routes.rb +19 -0
  79. data/lib/bio/shell/rails/doc/README_FOR_APP +2 -0
  80. data/lib/bio/shell/rails/public/404.html +8 -0
  81. data/lib/bio/shell/rails/public/500.html +8 -0
  82. data/lib/bio/shell/rails/public/dispatch.cgi +10 -0
  83. data/lib/bio/shell/rails/public/dispatch.fcgi +24 -0
  84. data/lib/bio/shell/rails/public/dispatch.rb +10 -0
  85. data/lib/bio/shell/rails/public/favicon.ico +0 -0
  86. data/lib/bio/shell/rails/public/images/icon.png +0 -0
  87. data/lib/bio/shell/rails/public/images/rails.png +0 -0
  88. data/lib/bio/shell/rails/public/index.html +277 -0
  89. data/lib/bio/shell/rails/public/javascripts/controls.js +750 -0
  90. data/lib/bio/shell/rails/public/javascripts/dragdrop.js +584 -0
  91. data/lib/bio/shell/rails/public/javascripts/effects.js +854 -0
  92. data/lib/bio/shell/rails/public/javascripts/prototype.js +1785 -0
  93. data/lib/bio/shell/rails/public/robots.txt +1 -0
  94. data/lib/bio/shell/rails/public/stylesheets/main.css +187 -0
  95. data/lib/bio/shell/rails/script/about +3 -0
  96. data/lib/bio/shell/rails/script/breakpointer +3 -0
  97. data/lib/bio/shell/rails/script/console +3 -0
  98. data/lib/bio/shell/rails/script/destroy +3 -0
  99. data/lib/bio/shell/rails/script/generate +3 -0
  100. data/lib/bio/shell/rails/script/performance/benchmarker +3 -0
  101. data/lib/bio/shell/rails/script/performance/profiler +3 -0
  102. data/lib/bio/shell/rails/script/plugin +3 -0
  103. data/lib/bio/shell/rails/script/process/reaper +3 -0
  104. data/lib/bio/shell/rails/script/process/spawner +3 -0
  105. data/lib/bio/shell/rails/script/process/spinner +3 -0
  106. data/lib/bio/shell/rails/script/runner +3 -0
  107. data/lib/bio/shell/rails/script/server +42 -0
  108. data/lib/bio/shell/rails/test/test_helper.rb +28 -0
  109. data/lib/bio/shell/web.rb +90 -0
  110. data/lib/bio/util/contingency_table.rb +231 -225
  111. data/sample/any2fasta.rb +59 -0
  112. data/test/data/HMMER/hmmpfam.out +64 -0
  113. data/test/data/HMMER/hmmsearch.out +88 -0
  114. data/test/data/aaindex/DAYM780301 +30 -0
  115. data/test/data/aaindex/PRAM900102 +20 -0
  116. data/test/data/bl2seq/cd8a_cd8b_blastp.bl2seq +53 -0
  117. data/test/data/bl2seq/cd8a_p53_e-5blastp.bl2seq +37 -0
  118. data/test/data/blast/{eco:b0002.faa → b0002.faa} +0 -0
  119. data/test/data/blast/{eco:b0002.faa.m0 → b0002.faa.m0} +2 -2
  120. data/test/data/blast/{eco:b0002.faa.m7 → b0002.faa.m7} +1 -1
  121. data/test/data/blast/{eco:b0002.faa.m8 → b0002.faa.m8} +0 -0
  122. data/test/unit/bio/appl/bl2seq/test_report.rb +134 -0
  123. data/test/unit/bio/appl/blast/test_report.rb +15 -12
  124. data/test/unit/bio/appl/blast/test_xmlparser.rb +4 -4
  125. data/test/unit/bio/appl/hmmer/test_report.rb +355 -0
  126. data/test/unit/bio/appl/test_blast.rb +5 -5
  127. data/test/unit/bio/data/test_na.rb +9 -18
  128. data/test/unit/bio/db/pdb/test_pdb.rb +169 -0
  129. data/test/unit/bio/db/test_aaindex.rb +197 -0
  130. data/test/unit/bio/io/test_fastacmd.rb +55 -0
  131. data/test/unit/bio/sequence/test_aa.rb +102 -0
  132. data/test/unit/bio/sequence/test_common.rb +178 -0
  133. data/test/unit/bio/sequence/test_compat.rb +82 -0
  134. data/test/unit/bio/sequence/test_na.rb +242 -0
  135. data/test/unit/bio/shell/plugin/test_seq.rb +29 -19
  136. data/test/unit/bio/test_alignment.rb +15 -7
  137. data/test/unit/bio/test_reference.rb +198 -0
  138. data/test/unit/bio/test_sequence.rb +4 -49
  139. data/test/unit/bio/test_shell.rb +2 -2
  140. metadata +118 -15
  141. data/lib/bio/io/brdb.rb +0 -103
  142. data/lib/bioruby.rb +0 -34
@@ -0,0 +1 @@
1
+ # See http://www.robotstxt.org/wc/norobots.html for documentation on how to use the robots.txt file
@@ -0,0 +1,187 @@
1
+ body { background-color: #fff; color: #333; }
2
+
3
+ body, p, td {
4
+ font-family: verdana, arial, helvetica, sans-serif;
5
+ font-size: 13px;
6
+ line-height: 18px;
7
+ }
8
+
9
+ pre {
10
+ background-color: #eee;
11
+ padding: 10px;
12
+ font-size: 11px;
13
+ }
14
+
15
+ a { color: #000; }
16
+ a:visited { color: #666; }
17
+ a:hover { color: #fff; background-color:#000; }
18
+
19
+ .fieldWithErrors {
20
+ padding: 2px;
21
+ background-color: red;
22
+ display: table;
23
+ }
24
+
25
+ table {
26
+ text-align: top;
27
+ }
28
+
29
+ #ErrorExplanation {
30
+ width: 400px;
31
+ border: 2px solid red;
32
+ padding: 7px;
33
+ padding-bottom: 12px;
34
+ margin-bottom: 20px;
35
+ background-color: #f0f0f0;
36
+ }
37
+
38
+ #ErrorExplanation h2 {
39
+ text-align: left;
40
+ font-weight: bold;
41
+ padding: 5px 5px 5px 15px;
42
+ font-size: 12px;
43
+ margin: -7px;
44
+ background-color: #c00;
45
+ color: #fff;
46
+ }
47
+
48
+ #ErrorExplanation p {
49
+ color: #333;
50
+ margin-bottom: 0;
51
+ padding: 5px;
52
+ }
53
+
54
+ #ErrorExplanation ul li {
55
+ font-size: 12px;
56
+ list-style: square;
57
+ }
58
+
59
+ h1{
60
+ color: #333;
61
+ padding: 10px;
62
+ margin: 12px;
63
+ }
64
+
65
+ #banner{
66
+ margin-left: 5em;
67
+ margin-right: -6px;
68
+ text-align: center;
69
+ font-size: 30px;
70
+ border-top: 1px solid silver;
71
+ border-bottom: 1px solid silver;
72
+ padding: 10px 0px 10px 0px;
73
+ }
74
+
75
+ tr{
76
+ vertical-align: top;
77
+ text-align: left;
78
+ }
79
+
80
+ #side img{
81
+ background-color: black;
82
+ width:12em;
83
+ }
84
+
85
+ #side{
86
+ position: absolute;
87
+ margin: -13px;
88
+ top: 1em;
89
+ left: 1em;
90
+ width: 10em;
91
+ }
92
+
93
+ #side ul{
94
+ font-size: 13px;
95
+ margin: 0em;
96
+ }
97
+
98
+ #side h2{
99
+ font-size: 12px;
100
+ text-align: center;
101
+ width: 13em;
102
+ background-color: black;
103
+ color: white;
104
+ }
105
+ #side input{
106
+ text-align: center;
107
+ }
108
+
109
+ #main {
110
+ margin-left: 10em;
111
+ padding-top: 4x;
112
+ padding-left: 2em;
113
+ background: white;
114
+ }
115
+
116
+ .main {
117
+ margin-left: 10em;
118
+ padding-top: 4x;
119
+ padding-left: 2em;
120
+ background: white;
121
+ }
122
+
123
+ #menu {
124
+ margin-left: 10em;
125
+ padding-top: 4x;
126
+ padding-left: 2em;
127
+ background: white;
128
+ }
129
+
130
+ div.uploadStatus {
131
+ margin: 5px;
132
+ }
133
+
134
+ div.progressBar {
135
+ margin: 5px;
136
+ }
137
+
138
+ div.progressBar div.border {
139
+ background-color: #fff;
140
+ border: 1px solid grey;
141
+ width: 100%;
142
+ }
143
+
144
+ div.progressBar div.background {
145
+ background-color: #333;
146
+ height: 18px;
147
+ width: 0%;
148
+ }
149
+
150
+ .tabs {
151
+ position:relative;
152
+ height: 20px;
153
+ margin: 0;
154
+ padding: 0;
155
+ background: #aaa repeat-x;
156
+ overflow:hidden
157
+ }
158
+
159
+ .tabs li {
160
+ display:inline;
161
+ }
162
+
163
+ .tabs a:hover, .tabs a.tab-active {
164
+ color:#333;
165
+ background:#fff url("bar_on.gif") repeat-x;
166
+ border-right: 1px solid #fff
167
+ }
168
+
169
+
170
+ .tabs a {
171
+ height: 27px;
172
+ font:12px verdana, helvetica, sans-serif;
173
+ font-weight:bold;
174
+ position:relative;
175
+ padding:6px 10px 10px 10px;
176
+ margin: 0px -4px 0px 0px;
177
+ color:#333;
178
+ text-decoration:none;
179
+ border-left:1px solid #fff;
180
+ border-right:1px solid #333;
181
+ }
182
+ .tab-container {
183
+ background: #fff;
184
+ border:1px solid #555;
185
+ }
186
+ .tab-panes {
187
+ margin: 3px }
@@ -0,0 +1,3 @@
1
+ #!/usr/bin/env ruby
2
+ require File.dirname(__FILE__) + '/../config/boot'
3
+ require 'commands/about'
@@ -0,0 +1,3 @@
1
+ #!/usr/bin/env ruby
2
+ require File.dirname(__FILE__) + '/../config/boot'
3
+ require 'commands/breakpointer'
@@ -0,0 +1,3 @@
1
+ #!/usr/bin/env ruby
2
+ require File.dirname(__FILE__) + '/../config/boot'
3
+ require 'commands/console'
@@ -0,0 +1,3 @@
1
+ #!/usr/bin/env ruby
2
+ require File.dirname(__FILE__) + '/../config/boot'
3
+ require 'commands/destroy'
@@ -0,0 +1,3 @@
1
+ #!/usr/bin/env ruby
2
+ require File.dirname(__FILE__) + '/../config/boot'
3
+ require 'commands/generate'
@@ -0,0 +1,3 @@
1
+ #!/usr/bin/env ruby
2
+ require File.dirname(__FILE__) + '/../../config/boot'
3
+ require 'commands/performance/benchmarker'
@@ -0,0 +1,3 @@
1
+ #!/usr/bin/env ruby
2
+ require File.dirname(__FILE__) + '/../../config/boot'
3
+ require 'commands/performance/profiler'
@@ -0,0 +1,3 @@
1
+ #!/usr/bin/env ruby
2
+ require File.dirname(__FILE__) + '/../config/boot'
3
+ require 'commands/plugin'
@@ -0,0 +1,3 @@
1
+ #!/usr/bin/env ruby
2
+ require File.dirname(__FILE__) + '/../../config/boot'
3
+ require 'commands/process/reaper'
@@ -0,0 +1,3 @@
1
+ #!/usr/bin/env ruby
2
+ require File.dirname(__FILE__) + '/../../config/boot'
3
+ require 'commands/process/spawner'
@@ -0,0 +1,3 @@
1
+ #!/usr/bin/env ruby
2
+ require File.dirname(__FILE__) + '/../../config/boot'
3
+ require 'commands/process/spinner'
@@ -0,0 +1,3 @@
1
+ #!/usr/bin/env ruby
2
+ require File.dirname(__FILE__) + '/../config/boot'
3
+ require 'commands/runner'
@@ -0,0 +1,42 @@
1
+ #!/usr/bin/env ruby
2
+ #
3
+ # = BioRuby shell on Rails server - GUI for the BioRuby shell
4
+ #
5
+ # Copyright:: Copyright (C) 2006
6
+ # Nobuya Tanaka <t@chemruby.org>,
7
+ # Toshiaki Katayama <k@bioruby.org>
8
+ # License:: Ruby's
9
+ #
10
+ # $Id: server,v 1.1 2006/02/27 11:16:23 k Exp $
11
+ #
12
+
13
+ require 'bio/shell'
14
+ require 'drb/drb'
15
+
16
+ require './app/models/shell_connection'
17
+ $drb_server = ShellConnection.new
18
+
19
+ ## Access Control List
20
+ #
21
+ # require 'drb/acl'
22
+ #
23
+ # list = %w(deny all
24
+ # allow 127.0.0.1
25
+ # )
26
+ # acl = ACL.new(list, ACL::DENY_ALLOW)
27
+ # DRb.install_acl(acl)
28
+ #
29
+
30
+ STDOUT.sync = true
31
+
32
+ #uri = "druby://localhost:0"
33
+ uri = 'druby://localhost:81064' # baioroji-
34
+ DRb.start_service(uri, $drb_server)
35
+ puts DRb.uri
36
+
37
+ puts "starting ..."
38
+
39
+ require './config/boot'
40
+ require 'commands/server'
41
+
42
+ puts "exiting ..."
@@ -0,0 +1,28 @@
1
+ ENV["RAILS_ENV"] = "test"
2
+ require File.expand_path(File.dirname(__FILE__) + "/../config/environment")
3
+ require 'test_help'
4
+
5
+ class Test::Unit::TestCase
6
+ # Transactional fixtures accelerate your tests by wrapping each test method
7
+ # in a transaction that's rolled back on completion. This ensures that the
8
+ # test database remains unchanged so your fixtures don't have to be reloaded
9
+ # between every test method. Fewer database queries means faster tests.
10
+ #
11
+ # Read Mike Clark's excellent walkthrough at
12
+ # http://clarkware.com/cgi/blosxom/2005/10/24#Rails10FastTesting
13
+ #
14
+ # Every Active Record database supports transactions except MyISAM tables
15
+ # in MySQL. Turn off transactional fixtures in this case; however, if you
16
+ # don't care one way or the other, switching from MyISAM to InnoDB tables
17
+ # is recommended.
18
+ self.use_transactional_fixtures = true
19
+
20
+ # Instantiated fixtures are slow, but give you @david where otherwise you
21
+ # would need people(:david). If you don't want to migrate your existing
22
+ # test cases which use the @david style and don't mind the speed hit (each
23
+ # instantiated fixtures translates to a database query per test method),
24
+ # then set this back to true.
25
+ self.use_instantiated_fixtures = false
26
+
27
+ # Add more helper methods to be used by all tests here...
28
+ end
@@ -0,0 +1,90 @@
1
+ #
2
+ # = bio/shell/web.rb - GUI for the BioRuby shell
3
+ #
4
+ # Copyright:: Copyright (C) 2006
5
+ # Nobuya Tanaka <t@chemruby.org>,
6
+ # Toshiaki Katayama <k@bioruby.org>
7
+ # License:: Ruby's
8
+ #
9
+ # $Id: web.rb,v 1.1 2006/02/27 09:22:42 k Exp $
10
+ #
11
+
12
+
13
+ module Bio::Shell
14
+
15
+ private
16
+
17
+ def rails_directory_setup
18
+ server = "script/server"
19
+ unless File.exists?(server)
20
+ require 'fileutils'
21
+ basedir = File.dirname(__FILE__)
22
+ print "Copying web server files ... "
23
+ FileUtils.cp_r("#{basedir}/rails/.", ".")
24
+ puts "done"
25
+ end
26
+ end
27
+
28
+ def rails_server_setup
29
+ require 'open3'
30
+ $web_server = Open3.popen3(server)
31
+
32
+ $web_error_log = File.open("log/web-error.log", "a")
33
+ $web_server[2].reopen($web_error_log)
34
+
35
+ while line = $web_server[1].gets
36
+ if line[/druby:\/\/localhost/]
37
+ uri = line.chomp
38
+ puts uri if $DEBUG
39
+ break
40
+ end
41
+ end
42
+
43
+ $web_access_log = File.open("log/web-access.log", "a")
44
+ $web_server[1].reopen($web_access_log)
45
+
46
+ return uri
47
+ end
48
+
49
+ def web
50
+ return if $web_server
51
+
52
+ require 'drb/drb'
53
+ # $SAFE = 1 # disable eval() and friends
54
+
55
+ rails_directory_setup
56
+ #uri = rails_server_setup
57
+ uri = 'druby://localhost:81064' # baioroji-
58
+
59
+ $drb_server = DRbObject.new_with_uri(uri)
60
+ $drb_server.puts_remote("Connected")
61
+
62
+ puts "Connected to server #{uri}"
63
+ puts "Open http://localhost:3000/shell/"
64
+
65
+ io = IRB.conf[:MAIN_CONTEXT].io
66
+
67
+ io.class.class_eval do
68
+ alias_method :shell_original_gets, :gets
69
+ end
70
+
71
+ def io.gets
72
+ bind = IRB.conf[:MAIN_CONTEXT].workspace.binding
73
+ vars = eval("local_variables", bind)
74
+ vars.each do |var|
75
+ next if var == "_"
76
+ if val = eval("#{var}", bind)
77
+ $drb_server[var] = val
78
+ else
79
+ $drb_server.delete(var)
80
+ end
81
+ end
82
+ line = shell_original_gets
83
+ line
84
+ end
85
+ end
86
+
87
+ end
88
+
89
+
90
+
@@ -1,14 +1,236 @@
1
- module Bio
2
-
3
1
  #
4
- # bio/util/contingency_table.rb - Statistical contingency table analysis for aligned sequences
2
+ # = bio/util/contingency_table.rb - Statistical contingency table analysis for aligned sequences
5
3
  #
6
4
  # Copyright:: Copyright (C) 2005 Trevor Wennblom <trevor@corevx.com>
7
5
  # License:: LGPL
8
6
  #
9
- # $Id: contingency_table.rb,v 1.2 2005/12/13 14:58:37 trevor Exp $
10
- #
7
+ # $Id: contingency_table.rb,v 1.4 2006/02/27 13:23:01 k Exp $
11
8
  #
9
+ # == Synopsis
10
+ #
11
+ # The Bio::ContingencyTable class provides basic statistical contingency table
12
+ # analysis for two positions within aligned sequences.
13
+ #
14
+ # When ContingencyTable is instantiated the set of characters in the
15
+ # aligned sequences may be passed to it as an array. This is
16
+ # important since it uses these characters to create the table's rows
17
+ # and columns. If this array is not passed it will use it's default
18
+ # of an amino acid and nucleotide alphabet in lowercase along with the
19
+ # clustal spacer '-'.
20
+ #
21
+ # To get data from the table the most used functions will be
22
+ # chi_square and contingency_coefficient:
23
+ #
24
+ # ctable = Bio::ContingencyTable.new()
25
+ # ctable['a']['t'] += 1
26
+ # # .. put more values into the table
27
+ # puts ctable.chi_square
28
+ # puts ctable.contingency_coefficient # between 0.0 and 1.0
29
+ #
30
+ # The contingency_coefficient represents the degree of correlation of
31
+ # change between two sequence positions in a multiple-sequence
32
+ # alignment. 0.0 indicates no correlation, 1.0 is the maximum
33
+ # correlation.
34
+ #
35
+ #
36
+ # == Further Reading
37
+ #
38
+ # * http://en.wikipedia.org/wiki/Contingency_table
39
+ # * http://www.physics.csbsju.edu/stats/exact.details.html
40
+ # * Numerical Recipes in C by Press, Flannery, Teukolsky, and Vetterling
41
+ # #
42
+ # == Usage
43
+ #
44
+ # What follows is an example of ContingencyTable in typical usage
45
+ # analyzing results from a clustal alignment.
46
+ #
47
+ # require 'bio'
48
+ # require 'bio/contingency_table'
49
+ #
50
+ # seqs = {}
51
+ # max_length = 0
52
+ # Bio::ClustalW::Report.new( IO.read('sample.aln') ).to_a.each do |entry|
53
+ # data = entry.data.strip
54
+ # seqs[entry.definition] = data.downcase
55
+ # max_length = data.size if max_length == 0
56
+ # raise "Aligned sequences must be the same length!" unless data.size == max_length
57
+ # end
58
+ #
59
+ # VERBOSE = true
60
+ # puts "i\tj\tchi_square\tcontingency_coefficient" if VERBOSE
61
+ # correlations = {}
62
+ #
63
+ # 0.upto(max_length - 1) do |i|
64
+ # (i+1).upto(max_length - 1) do |j|
65
+ # ctable = Bio::ContingencyTable.new()
66
+ # seqs.each_value { |seq| ctable.table[ seq[i].chr ][ seq[j].chr ] += 1 }
67
+ #
68
+ # chi_square = ctable.chi_square
69
+ # contingency_coefficient = ctable.contingency_coefficient
70
+ # puts [(i+1), (j+1), chi_square, contingency_coefficient].join("\t") if VERBOSE
71
+ #
72
+ # correlations["#{i+1},#{j+1}"] = contingency_coefficient
73
+ # correlations["#{j+1},#{i+1}"] = contingency_coefficient # Both ways are accurate
74
+ # end
75
+ # end
76
+ #
77
+ # require 'yaml'
78
+ # File.new('results.yml', 'a+') { |f| f.puts correlations.to_yaml }
79
+ #
80
+ #
81
+ # == Tutorial
82
+ #
83
+
84
+ # ContingencyTable returns the statistical significance of change
85
+ # between two positions in an alignment. If you would like to see how
86
+ # every possible combination of positions in your alignment compares
87
+ # to one another you must set this up yourself. Hopefully the
88
+ # provided examples will help you get started without too much
89
+ # trouble.
90
+ #
91
+ # def lite_example(sequences, max_length, characters)
92
+ #
93
+ # %w{i j chi_square contingency_coefficient}.each { |x| print x.ljust(12) }
94
+ # puts
95
+ #
96
+ # 0.upto(max_length - 1) do |i|
97
+ # (i+1).upto(max_length - 1) do |j|
98
+ # ctable = Bio::ContingencyTable.new( characters )
99
+ # sequences.each do |seq|
100
+ # i_char = seq[i].chr
101
+ # j_char = seq[j].chr
102
+ # ctable.table[i_char][j_char] += 1
103
+ # end
104
+ # chi_square = ctable.chi_square
105
+ # contingency_coefficient = ctable.contingency_coefficient
106
+ # [(i+1), (j+1), chi_square, contingency_coefficient].each { |x| print x.to_s.ljust(12) }
107
+ # puts
108
+ # end
109
+ # end
110
+ #
111
+ # end
112
+ #
113
+ # allowed_letters = Array.new
114
+ # allowed_letters = 'abcdefghijk'.split('')
115
+ #
116
+ # seqs = Array.new
117
+ # seqs << 'abcde'
118
+ # seqs << 'abcde'
119
+ # seqs << 'aacje'
120
+ # seqs << 'aacae'
121
+ #
122
+ # length_of_every_sequence = seqs[0].size # 5 letters long
123
+ #
124
+ # lite_example(seqs, length_of_every_sequence, allowed_letters)
125
+ #
126
+ #
127
+ # Producing the following results:
128
+ #
129
+ # i j chi_square contingency_coefficient
130
+ # 1 2 0.0 0.0
131
+ # 1 3 0.0 0.0
132
+ # 1 4 0.0 0.0
133
+ # 1 5 0.0 0.0
134
+ # 2 3 0.0 0.0
135
+ # 2 4 4.0 0.707106781186548
136
+ # 2 5 0.0 0.0
137
+ # 3 4 0.0 0.0
138
+ # 3 5 0.0 0.0
139
+ # 4 5 0.0 0.0
140
+ #
141
+ # The position i=2 and j=4 has a high contingency coefficient
142
+ # indicating that the changes at these positions are related. Note
143
+ # that i and j are arbitrary, this could be represented as i=4 and j=2
144
+ # since they both refer to position two and position four in the
145
+ # alignment. Here are some more examples:
146
+ #
147
+ # seqs = Array.new
148
+ # seqs << 'abcde'
149
+ # seqs << 'abcde'
150
+ # seqs << 'aacje'
151
+ # seqs << 'aacae'
152
+ # seqs << 'akcfe'
153
+ # seqs << 'akcfe'
154
+ #
155
+ # length_of_every_sequence = seqs[0].size # 5 letters long
156
+ #
157
+ # lite_example(seqs, length_of_every_sequence, allowed_letters)
158
+ #
159
+ #
160
+ # Results:
161
+ #
162
+ # i j chi_square contingency_coefficient
163
+ # 1 2 0.0 0.0
164
+ # 1 3 0.0 0.0
165
+ # 1 4 0.0 0.0
166
+ # 1 5 0.0 0.0
167
+ # 2 3 0.0 0.0
168
+ # 2 4 12.0 0.816496580927726
169
+ # 2 5 0.0 0.0
170
+ # 3 4 0.0 0.0
171
+ # 3 5 0.0 0.0
172
+ # 4 5 0.0 0.0
173
+ #
174
+ # Here we can see that the strength of the correlation of change has
175
+ # increased when more data is added with correlated changes at the
176
+ # same positions.
177
+ #
178
+ # seqs = Array.new
179
+ # seqs << 'abcde'
180
+ # seqs << 'abcde'
181
+ # seqs << 'kacje' # changed first letter
182
+ # seqs << 'aacae'
183
+ # seqs << 'akcfa' # changed last letter
184
+ # seqs << 'akcfe'
185
+ #
186
+ # length_of_every_sequence = seqs[0].size # 5 letters long
187
+ #
188
+ # lite_example(seqs, length_of_every_sequence, allowed_letters)
189
+ #
190
+ #
191
+ # Results:
192
+ #
193
+ # i j chi_square contingency_coefficient
194
+ # 1 2 2.4 0.534522483824849
195
+ # 1 3 0.0 0.0
196
+ # 1 4 6.0 0.707106781186548
197
+ # 1 5 0.24 0.196116135138184
198
+ # 2 3 0.0 0.0
199
+ # 2 4 12.0 0.816496580927726
200
+ # 2 5 2.4 0.534522483824849
201
+ # 3 4 0.0 0.0
202
+ # 3 5 0.0 0.0
203
+ # 4 5 2.4 0.534522483824849
204
+ #
205
+ # With random changes it becomes more difficult to identify correlated
206
+ # changes, yet positions two and four still have the highest
207
+ # correlation as indicated by the contingency coefficient. The best
208
+ # way to improve the accuracy of your results, as is often the case
209
+ # with statistics, is to increase the sample size.
210
+ #
211
+ #
212
+ # == A Note on Efficiency
213
+ #
214
+
215
+ # ContingencyTable is slow. It involves many calculations for even a
216
+ # seemingly small five-string data set. Even worse, it's very
217
+ # dependent on matrix traversal, and this is done with two dimensional
218
+ # hashes which dashes any hope of decent speed.
219
+ #
220
+
221
+ # Finally, half of the matrix is redundant and positions could be
222
+ # summed with their companion position to reduce calculations. For
223
+ # example the positions (5,2) and (2,5) could both have their values
224
+ # added together and just stored in (2,5) while (5,2) could be an
225
+ # illegal position. Also, positions (1,1), (2,2), (3,3), etc. will
226
+ # never be used.
227
+ #
228
+ # The purpose of this package is flexibility and education. The code
229
+ # is short and to the point in aims of achieving that purpose. If the
230
+ # BioRuby project moves towards C extensions in the future a
231
+ # professional caliber version will likely be created.
232
+ #
233
+ #
12
234
  #--
13
235
  #
14
236
  # This library is free software; you can redistribute it and/or
@@ -29,225 +251,7 @@ module Bio
29
251
  #
30
252
  #
31
253
 
32
- =begin rdoc
33
- bio/util/contingency_table.rb - Statistical contingency table analysis for aligned sequences
34
-
35
- == Synopsis
36
-
37
- The Bio::ContingencyTable class provides basic statistical contingency table
38
- analysis for two positions within aligned sequences.
39
-
40
- When ContingencyTable is instantiated the set of characters in the aligned sequences may be
41
- passed to it as an array. This is important since it uses these characters
42
- to create the table's rows and columns. If this array is not passed it will
43
- use it's default of an amino acid and nucleotide alphabet in lowercase along with the
44
- clustal spacer '-'.
45
-
46
- To get data from the table the most used functions will be chi_square and contingency_coefficient:
47
- ctable = Bio::ContingencyTable.new()
48
- ctable['a']['t'] += 1
49
- # .. put more values into the table
50
- puts ctable.chi_square
51
- puts ctable.contingency_coefficient # between 0.0 and 1.0
52
-
53
- The contingency_coefficient represents the degree of correlation of change between two
54
- sequence positions in a multiple-sequence alignment. 0.0 indicates no correlation, 1.0 is the
55
- maximum correlation.
56
-
57
-
58
- == Further Reading
59
-
60
- * http://en.wikipedia.org/wiki/Contingency_table
61
- * http://www.physics.csbsju.edu/stats/exact.details.html
62
- * Numerical Recipes in C by Press, Flannery, Teukolsky, and Vetterling
63
-
64
-
65
- == Usage
66
-
67
- What follows is an example of ContingencyTable in typical usage analyzing results from a clustal alignment.
68
-
69
- require 'bio'
70
- require 'bio/contingency_table'
71
-
72
- seqs = {}
73
- max_length = 0
74
- Bio::ClustalW::Report.new( IO.read('sample.aln') ).to_a.each do |entry|
75
- data = entry.data.strip
76
- seqs[entry.definition] = data.downcase
77
- max_length = data.size if max_length == 0
78
- raise "Aligned sequences must be the same length!" unless data.size == max_length
79
- end
80
-
81
- VERBOSE = true
82
- puts "i\tj\tchi_square\tcontingency_coefficient" if VERBOSE
83
- correlations = {}
84
-
85
- 0.upto(max_length - 1) do |i|
86
- (i+1).upto(max_length - 1) do |j|
87
- ctable = Bio::ContingencyTable.new()
88
- seqs.each_value { |seq| ctable.table[ seq[i].chr ][ seq[j].chr ] += 1 }
89
-
90
- chi_square = ctable.chi_square
91
- contingency_coefficient = ctable.contingency_coefficient
92
- puts [(i+1), (j+1), chi_square, contingency_coefficient].join("\t") if VERBOSE
93
-
94
- correlations["#{i+1},#{j+1}"] = contingency_coefficient
95
- correlations["#{j+1},#{i+1}"] = contingency_coefficient # Both ways are accurate
96
- end
97
- end
98
-
99
- require 'yaml'
100
- File.new('results.yml', 'a+') { |f| f.puts correlations.to_yaml }
101
-
102
-
103
- == Tutorial
104
-
105
- ContingencyTable returns the statistical significance of change between two positions in an alignment.
106
- If you would like to see how every possible combination of positions in your alignment compares to one another
107
- you must set this up yourself. Hopefully the provided examples will help you get started without
108
- too much trouble.
109
-
110
- def lite_example(sequences, max_length, characters)
111
-
112
- %w{i j chi_square contingency_coefficient}.each { |x| print x.ljust(12) }
113
- puts
114
-
115
- 0.upto(max_length - 1) do |i|
116
- (i+1).upto(max_length - 1) do |j|
117
- ctable = Bio::ContingencyTable.new( characters )
118
- sequences.each do |seq|
119
- i_char = seq[i].chr
120
- j_char = seq[j].chr
121
- ctable.table[i_char][j_char] += 1
122
- end
123
- chi_square = ctable.chi_square
124
- contingency_coefficient = ctable.contingency_coefficient
125
- [(i+1), (j+1), chi_square, contingency_coefficient].each { |x| print x.to_s.ljust(12) }
126
- puts
127
- end
128
- end
129
-
130
- end
131
-
132
- allowed_letters = Array.new
133
- allowed_letters = 'abcdefghijk'.split('')
134
-
135
- seqs = Array.new
136
- seqs << 'abcde'
137
- seqs << 'abcde'
138
- seqs << 'aacje'
139
- seqs << 'aacae'
140
-
141
- length_of_every_sequence = seqs[0].size # 5 letters long
142
-
143
- lite_example(seqs, length_of_every_sequence, allowed_letters)
144
-
145
-
146
- Producing the following results:
147
-
148
- i j chi_square contingency_coefficient
149
- 1 2 0.0 0.0
150
- 1 3 0.0 0.0
151
- 1 4 0.0 0.0
152
- 1 5 0.0 0.0
153
- 2 3 0.0 0.0
154
- 2 4 4.0 0.707106781186548
155
- 2 5 0.0 0.0
156
- 3 4 0.0 0.0
157
- 3 5 0.0 0.0
158
- 4 5 0.0 0.0
159
-
160
- The position i=2 and j=4 has a high contingency coefficient indicating that the changes at these
161
- positions are related. Note that i and j are arbitrary, this could be represented as i=4 and j=2
162
- since they both refer to position two and position four in the alignment. Here are some more examples:
163
-
164
- seqs = Array.new
165
- seqs << 'abcde'
166
- seqs << 'abcde'
167
- seqs << 'aacje'
168
- seqs << 'aacae'
169
- seqs << 'akcfe'
170
- seqs << 'akcfe'
171
-
172
- length_of_every_sequence = seqs[0].size # 5 letters long
173
-
174
- lite_example(seqs, length_of_every_sequence, allowed_letters)
175
-
176
-
177
- Results:
178
-
179
- i j chi_square contingency_coefficient
180
- 1 2 0.0 0.0
181
- 1 3 0.0 0.0
182
- 1 4 0.0 0.0
183
- 1 5 0.0 0.0
184
- 2 3 0.0 0.0
185
- 2 4 12.0 0.816496580927726
186
- 2 5 0.0 0.0
187
- 3 4 0.0 0.0
188
- 3 5 0.0 0.0
189
- 4 5 0.0 0.0
190
-
191
- Here we can see that the strength of the correlation of change has increased when more data is added with correlated changes at the same positions.
192
-
193
- seqs = Array.new
194
- seqs << 'abcde'
195
- seqs << 'abcde'
196
- seqs << 'kacje' # changed first letter
197
- seqs << 'aacae'
198
- seqs << 'akcfa' # changed last letter
199
- seqs << 'akcfe'
200
-
201
- length_of_every_sequence = seqs[0].size # 5 letters long
202
-
203
- lite_example(seqs, length_of_every_sequence, allowed_letters)
204
-
205
-
206
- Results:
207
-
208
- i j chi_square contingency_coefficient
209
- 1 2 2.4 0.534522483824849
210
- 1 3 0.0 0.0
211
- 1 4 6.0 0.707106781186548
212
- 1 5 0.24 0.196116135138184
213
- 2 3 0.0 0.0
214
- 2 4 12.0 0.816496580927726
215
- 2 5 2.4 0.534522483824849
216
- 3 4 0.0 0.0
217
- 3 5 0.0 0.0
218
- 4 5 2.4 0.534522483824849
219
-
220
- With random changes it becomes more difficult to identify correlated changes, yet positions two
221
- and four still have the highest correlation as indicated by the contingency coefficient. The
222
- best way to improve the accuracy of your results, as is often the case with statistics, is to
223
- increase the sample size.
224
-
225
-
226
- == A Note on Efficiency
227
-
228
- ContingencyTable is slow. It involves many calculations for even a seemingly small five-string data set.
229
- Even worse, it's very dependent on matrix traversal, and this is done with two dimensional hashes which
230
- dashes any hope of decent speed.
231
-
232
- Finally, half of the matrix is redundant and positions could be summed with their companion position to reduce
233
- calculations. For example the positions (5,2) and (2,5) could both have their values added together and
234
- just stored in (2,5) while (5,2) could be an illegal position. Also, positions (1,1), (2,2), (3,3), etc.
235
- will never be used.
236
-
237
- The purpose of this package is flexibility and education. The code is short and to the point in
238
- aims of achieving that purpose. If the BioRuby project moves towards C extensions in the future a
239
- professional caliber version will likely be created.
240
-
241
-
242
- == Author
243
- Trevor Wennblom <trevor@corevx.com>
244
-
245
-
246
- == Copyright
247
- Copyright (C) 2005 Trevor Wennblom
248
- Licensed under the same terms as BioRuby.
249
-
250
- =end
254
+ module Bio
251
255
 
252
256
  class ContingencyTable
253
257
  # Since we're making this math-notation friendly here is the layout of @table:
@@ -334,4 +338,6 @@ class ContingencyTable
334
338
  end
335
339
 
336
340
  end
337
- end
341
+
342
+ end # Bio
343
+