bio 1.4.1 → 1.4.2
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- data/ChangeLog +954 -0
- data/KNOWN_ISSUES.rdoc +40 -5
- data/README.rdoc +36 -35
- data/RELEASE_NOTES.rdoc +87 -59
- data/bioruby.gemspec +24 -2
- data/doc/RELEASE_NOTES-1.4.1.rdoc +104 -0
- data/doc/Tutorial.rd +162 -200
- data/doc/Tutorial.rd.html +149 -146
- data/lib/bio.rb +1 -0
- data/lib/bio/appl/blast.rb +1 -1
- data/lib/bio/appl/blast/ddbj.rb +26 -34
- data/lib/bio/appl/blast/genomenet.rb +21 -11
- data/lib/bio/db/embl/sptr.rb +193 -21
- data/lib/bio/db/fasta.rb +1 -1
- data/lib/bio/db/fastq.rb +14 -0
- data/lib/bio/db/fastq/format_fastq.rb +2 -2
- data/lib/bio/db/genbank/ddbj.rb +1 -2
- data/lib/bio/db/genbank/format_genbank.rb +1 -1
- data/lib/bio/db/medline.rb +1 -0
- data/lib/bio/db/newick.rb +3 -1
- data/lib/bio/db/pdb/pdb.rb +9 -9
- data/lib/bio/db/pdb/residue.rb +2 -2
- data/lib/bio/io/ddbjrest.rb +344 -0
- data/lib/bio/io/ncbirest.rb +121 -1
- data/lib/bio/location.rb +2 -2
- data/lib/bio/reference.rb +3 -4
- data/lib/bio/shell/plugin/entry.rb +7 -3
- data/lib/bio/shell/plugin/ncbirest.rb +5 -1
- data/lib/bio/util/restriction_enzyme.rb +3 -0
- data/lib/bio/util/restriction_enzyme/dense_int_array.rb +195 -0
- data/lib/bio/util/restriction_enzyme/range/sequence_range.rb +7 -7
- data/lib/bio/util/restriction_enzyme/range/sequence_range/calculated_cuts.rb +57 -18
- data/lib/bio/util/restriction_enzyme/range/sequence_range/fragment.rb +2 -2
- data/lib/bio/util/restriction_enzyme/sorted_num_array.rb +219 -0
- data/lib/bio/version.rb +1 -1
- data/sample/test_restriction_enzyme_long.rb +4403 -0
- data/test/data/fasta/EFTU_BACSU.fasta +8 -0
- data/test/data/genbank/CAA35997.gp +48 -0
- data/test/data/genbank/SCU49845.gb +167 -0
- data/test/data/litdb/1717226.litdb +13 -0
- data/test/data/pir/CRAB_ANAPL.pir +6 -0
- data/test/functional/bio/appl/blast/test_remote.rb +93 -0
- data/test/functional/bio/appl/test_blast.rb +61 -0
- data/test/functional/bio/io/test_ddbjrest.rb +47 -0
- data/test/functional/bio/test_command.rb +3 -3
- data/test/unit/bio/db/embl/test_sptr.rb +6 -6
- data/test/unit/bio/db/embl/test_uniprot_new_part.rb +208 -0
- data/test/unit/bio/db/genbank/test_common.rb +274 -0
- data/test/unit/bio/db/genbank/test_genbank.rb +401 -0
- data/test/unit/bio/db/genbank/test_genpept.rb +81 -0
- data/test/unit/bio/db/pdb/test_pdb.rb +3287 -11
- data/test/unit/bio/db/test_fasta.rb +34 -12
- data/test/unit/bio/db/test_fastq.rb +26 -0
- data/test/unit/bio/db/test_litdb.rb +95 -0
- data/test/unit/bio/db/test_medline.rb +1 -0
- data/test/unit/bio/db/test_nbrf.rb +82 -0
- data/test/unit/bio/db/test_newick.rb +22 -4
- data/test/unit/bio/test_reference.rb +35 -0
- data/test/unit/bio/util/restriction_enzyme/test_dense_int_array.rb +201 -0
- data/test/unit/bio/util/restriction_enzyme/test_sorted_num_array.rb +281 -0
- metadata +44 -38
data/doc/Tutorial.rd.html
CHANGED
@@ -11,29 +11,29 @@
|
|
11
11
|
<h1><a name="label-0" id="label-0">BioRuby Tutorial</a></h1><!-- RDLabel: "BioRuby Tutorial" -->
|
12
12
|
<ul>
|
13
13
|
<li>Copyright (C) 2001-2003 KATAYAMA Toshiaki <k .at. bioruby.org></li>
|
14
|
-
<li>Copyright (C) 2005-
|
14
|
+
<li>Copyright (C) 2005-2011 Pjotr Prins, Naohisa Goto and others</li>
|
15
15
|
</ul>
|
16
|
-
<p>This document was last modified:
|
17
|
-
Current editor:
|
18
|
-
<p>The latest version resides in the GIT source code repository: ./doc/<a href="
|
16
|
+
<p>This document was last modified: 2011/03/24
|
17
|
+
Current editor: Michael O'Keefe <okeefm (at) rpi (dot) edu></p>
|
18
|
+
<p>The latest version resides in the GIT source code repository: ./doc/<a href="https://github.com/bioruby/bioruby/blob/master/doc/Tutorial.rd">Tutorial.rd</a>.</p>
|
19
19
|
<h2><a name="label-1" id="label-1">Introduction</a></h2><!-- RDLabel: "Introduction" -->
|
20
20
|
<p>This is a tutorial for using Bioruby. A basic knowledge of Ruby is required.
|
21
|
-
If you want to know more about the programming
|
21
|
+
If you want to know more about the programming language, we recommend the
|
22
22
|
latest Ruby book <a href="http://www.pragprog.com/titles/ruby">Programming Ruby</a>
|
23
|
-
by Dave Thomas and Andy Hunt - the first edition
|
23
|
+
by Dave Thomas and Andy Hunt - the first edition can be read online
|
24
24
|
<a href="http://www.ruby-doc.org/docs/ProgrammingRuby/">here</a>.</p>
|
25
25
|
<p>For BioRuby you need to install Ruby and the BioRuby package on your computer</p>
|
26
26
|
<p>You can check whether Ruby is installed on your computer and what
|
27
27
|
version it has with the</p>
|
28
28
|
<pre>% ruby -v</pre>
|
29
|
-
<p>command.
|
29
|
+
<p>command. You should see something like:</p>
|
30
30
|
<pre>ruby 1.8.7 (2008-08-11 patchlevel 72) [i486-linux]</pre>
|
31
31
|
<p>If you see no such thing you'll have to install Ruby using your installation
|
32
32
|
manager. For more information see the
|
33
33
|
<a href="http://www.ruby-lang.org/en/">Ruby</a> website.</p>
|
34
34
|
<p>With Ruby download and install Bioruby using the links on the
|
35
35
|
<a href="http://bioruby.org/">Bioruby</a> website. The recommended installation is via
|
36
|
-
|
36
|
+
RubyGems:</p>
|
37
37
|
<pre>gem install bio</pre>
|
38
38
|
<p>See also the Bioruby <a href="http://bioruby.open-bio.org/wiki/Installation">wiki</a>.</p>
|
39
39
|
<p>A lot of BioRuby's documentation exists in the source code and unit tests. To
|
@@ -41,9 +41,10 @@ really dive in you will need the latest source code tree. The embedded rdoc
|
|
41
41
|
documentation can be viewed online at
|
42
42
|
<a href="http://bioruby.org/rdoc/">bioruby's rdoc</a>. But first lets start!</p>
|
43
43
|
<h2><a name="label-2" id="label-2">Trying Bioruby</a></h2><!-- RDLabel: "Trying Bioruby" -->
|
44
|
-
<p>Bioruby comes with its own shell. After unpacking the sources run the
|
45
|
-
|
46
|
-
<
|
44
|
+
<p>Bioruby comes with its own shell. After unpacking the sources run one of the following commands:</p>
|
45
|
+
<pre>bioruby</pre>
|
46
|
+
<p>or, from the source tree</p>
|
47
|
+
<pre>cd bioruby
|
47
48
|
ruby -I lib bin/bioruby</pre>
|
48
49
|
<p>and you should see a prompt</p>
|
49
50
|
<pre>bioruby></pre>
|
@@ -60,11 +61,11 @@ question to the mailing list. BioRuby developers usually try to help.</p>
|
|
60
61
|
<h2><a name="label-3" id="label-3">Working with nucleic / amino acid sequences (Bio::Sequence class)</a></h2><!-- RDLabel: "Working with nucleic / amino acid sequences (Bio::Sequence class)" -->
|
61
62
|
<p>The Bio::Sequence class allows the usual sequence transformations and
|
62
63
|
translations. In the example below the DNA sequence "atgcatgcaaaa" is
|
63
|
-
converted into the complemental strand
|
64
|
-
next the nucleic acid composition is calculated and the sequence is
|
64
|
+
converted into the complemental strand and spliced into a subsequence;
|
65
|
+
next, the nucleic acid composition is calculated and the sequence is
|
65
66
|
translated into the amino acid sequence, the molecular weight
|
66
|
-
calculated, and so on. When translating into amino acid sequences the
|
67
|
-
frame can be specified and optionally the
|
67
|
+
calculated, and so on. When translating into amino acid sequences, the
|
68
|
+
frame can be specified and optionally the codon table selected (as
|
68
69
|
defined in codontable.rb).</p>
|
69
70
|
<pre>bioruby> seq = Bio::Sequence::NA.new("atgcatgcaaaa")
|
70
71
|
==> "atgcatgcaaaa"
|
@@ -73,7 +74,7 @@ defined in codontable.rb).</p>
|
|
73
74
|
bioruby> seq.complement
|
74
75
|
==> "ttttgcatgcat"
|
75
76
|
|
76
|
-
bioruby> seq.subseq(3,8) # gets subsequence of positions 3 to 8
|
77
|
+
bioruby> seq.subseq(3,8) # gets subsequence of positions 3 to 8 (starting from 1)
|
77
78
|
==> "gcatgc"
|
78
79
|
bioruby> seq.gc_percent
|
79
80
|
==> 33
|
@@ -112,10 +113,10 @@ Windows). For example</p>
|
|
112
113
|
<pre>% ri puts
|
113
114
|
% ri p
|
114
115
|
% ri File.open</pre>
|
115
|
-
<p>Nucleic acid sequence
|
116
|
-
amino acid sequence
|
116
|
+
<p>Nucleic acid sequence are members of the Bio::Sequence::NA class, and
|
117
|
+
amino acid sequence are members of the Bio::Sequence::AA class. Shared
|
117
118
|
methods are in the parent Bio::Sequence class.</p>
|
118
|
-
<p>As Bio::Sequence
|
119
|
+
<p>As Bio::Sequence inherits Ruby's String class, you can use
|
119
120
|
String class methods. For example, to get a subsequence, you can
|
120
121
|
not only use subseq(from, to) but also String#[].</p>
|
121
122
|
<p>Please take note that the Ruby's string's are base 0 - i.e. the first letter
|
@@ -128,14 +129,13 @@ bioruby> s[0..1]
|
|
128
129
|
==> "ab"</pre>
|
129
130
|
<p>So when using String methods, you should subtract 1 from positions
|
130
131
|
conventionally used in biology. (subseq method will throw an exception if you
|
131
|
-
specify positions smaller than or equal to 0 for either one of the "from" or
|
132
|
-
"to".)</p>
|
132
|
+
specify positions smaller than or equal to 0 for either one of the "from" or "to".)</p>
|
133
133
|
<p>The window_search(window_size, step_size) method shows a typical Ruby
|
134
134
|
way of writing concise and clear code using 'closures'. Each sliding
|
135
135
|
window creates a subsequence which is supplied to the enclosed block
|
136
136
|
through a variable named +s+.</p>
|
137
137
|
<ul>
|
138
|
-
<li><p>Show average percentage of GC content for 20 bases (stepping the default one base at a time)
|
138
|
+
<li><p>Show average percentage of GC content for 20 bases (stepping the default one base at a time):</p>
|
139
139
|
<pre>bioruby> seq = Bio::Sequence::NA.new("atgcatgcaattaagctaatcccaattagatcatcccgatcatcaaaaaaaaaa")
|
140
140
|
==> "atgcatgcaattaagctaatcccaattagatcatcccgatcatcaaaaaaaaaa"
|
141
141
|
|
@@ -195,8 +195,8 @@ my_naseq = Bio::Sequence::NA.new(input_seq)
|
|
195
195
|
my_aaseq = my_naseq.translate
|
196
196
|
|
197
197
|
puts my_aaseq</pre>
|
198
|
-
<p>Save the program as na2aa.rb. Prepare a nucleic acid sequence
|
199
|
-
described below and
|
198
|
+
<p>Save the program above as na2aa.rb. Prepare a nucleic acid sequence
|
199
|
+
described below and save it as my_naseq.txt:</p>
|
200
200
|
<pre>gtggcgatctttccgaaagcgatgactggagcgaagaaccaaagcagtgacatttgtctg
|
201
201
|
atgccgcacgtaggcctgataagacgcggacagcgtcgcatcaggcatcttgtgcaaatg
|
202
202
|
tcggatgcggcgtga</pre>
|
@@ -207,7 +207,7 @@ For example, translates my_naseq.txt:</p>
|
|
207
207
|
<pre>% cat my_naseq.txt|ruby na2aa.rb</pre>
|
208
208
|
<p>Outputs</p>
|
209
209
|
<pre>VAIFPKAMTGAKNQSSDICLMPHVGLIRRGQRRIRHLVQMSDAA*</pre>
|
210
|
-
<p>You can also write this, a bit
|
210
|
+
<p>You can also write this, a bit fancifully, as a one-liner script.</p>
|
211
211
|
<pre>% ruby -r bio -e 'p Bio::Sequence::NA.new($<.read).translate' my_naseq.txt</pre>
|
212
212
|
<p>In the next section we will retrieve data from databases instead of using raw
|
213
213
|
sequence files. One generic example of the above can be found in
|
@@ -215,7 +215,7 @@ sequence files. One generic example of the above can be found in
|
|
215
215
|
<h2><a name="label-4" id="label-4">Parsing GenBank data (Bio::GenBank class)</a></h2><!-- RDLabel: "Parsing GenBank data (Bio::GenBank class)" -->
|
216
216
|
<p>We assume that you already have some GenBank data files. (If you don't,
|
217
217
|
download some .seq files from ftp://ftp.ncbi.nih.gov/genbank/)</p>
|
218
|
-
<p>As an example we fetch the ID, definition and sequence of each entry
|
218
|
+
<p>As an example we will fetch the ID, definition and sequence of each entry
|
219
219
|
from the GenBank format and convert it to FASTA. This is also an example
|
220
220
|
script in the BioRuby distribution.</p>
|
221
221
|
<p>A first attempt could be to use the Bio::GenBank class for reading in
|
@@ -256,7 +256,7 @@ ff.each_entry do |f|
|
|
256
256
|
puts "nalen : " + f.nalen.to_s
|
257
257
|
puts "naseq : " + f.naseq
|
258
258
|
end</pre>
|
259
|
-
<p>In above two scripts, the first arguments of Bio::FlatFile.new are
|
259
|
+
<p>In the above two scripts, the first arguments of Bio::FlatFile.new are
|
260
260
|
database classes of BioRuby. This is expanded on in a later section.</p>
|
261
261
|
<p>Again another option is to use the Bio::DB.open class:</p>
|
262
262
|
<pre>#!/usr/bin/env ruby
|
@@ -311,12 +311,9 @@ end</pre>
|
|
311
311
|
<ul>
|
312
312
|
<li>Note: In this example Feature#assoc method makes a Hash from a
|
313
313
|
feature object. It is useful because you can get data from the hash
|
314
|
-
by using qualifiers as keys.
|
315
|
-
(But there is a risk some information is lost when two or more
|
316
|
-
qualifiers are the same. Therefore an Array is returned by
|
317
|
-
Feature#feature)</li>
|
314
|
+
by using qualifiers as keys. But there is a risk some information is lost when two or more qualifiers are the same. Therefore an Array is returned by Feature#feature.</li>
|
318
315
|
</ul>
|
319
|
-
<p>Bio::Sequence#splicing splices
|
316
|
+
<p>Bio::Sequence#splicing splices subsequences from nucleic acid sequences
|
320
317
|
according to location information used in GenBank, EMBL and DDBJ.</p>
|
321
318
|
<p>When the specified translation table is different from the default
|
322
319
|
(universal), or when the first codon is not "atg" or the protein
|
@@ -332,7 +329,7 @@ bio/location.rb.</p>
|
|
332
329
|
<pre>locs = Bio::Locations.new('join((8298.8300)..10206,1..855)')
|
333
330
|
naseq.splicing(locs)</pre></li>
|
334
331
|
</ul>
|
335
|
-
<p>You can also use
|
332
|
+
<p>You can also use this splicing method for amino acid sequences
|
336
333
|
(Bio::Sequence::AA objects).</p>
|
337
334
|
<ul>
|
338
335
|
<li><p>Splicing peptide from a protein (e.g. signal peptide)</p>
|
@@ -344,10 +341,7 @@ with classes like Bio::GenBank, Bio::KEGG::GENES. A full list can be found in
|
|
344
341
|
the ./lib/bio/db directory of the BioRuby source tree.</p>
|
345
342
|
<p>In many cases the Bio::DatabaseClass acts as a factory pattern
|
346
343
|
and recognises the database type automatically - returning a
|
347
|
-
parsed object. For example using Bio::FlatFile
|
348
|
-
<p>Bio::FlatFile class as described above. The first argument of the
|
349
|
-
Bio::FlatFile.new is database class name in BioRuby (such as Bio::GenBank,
|
350
|
-
Bio::KEGG::GENES and so on).</p>
|
344
|
+
parsed object. For example using Bio::FlatFile class as described above. The first argument of the Bio::FlatFile.new is database class name in BioRuby (such as Bio::GenBank, Bio::KEGG::GENES and so on).</p>
|
351
345
|
<pre>ff = Bio::FlatFile.new(Bio::DatabaseClass, ARGF)</pre>
|
352
346
|
<p>Isn't it wonderful that Bio::FlatFile automagically recognizes each
|
353
347
|
database class?</p>
|
@@ -361,16 +355,15 @@ ff.each_entry do |entry|
|
|
361
355
|
p entry.definition # definition of the entry
|
362
356
|
p entry.seq # sequence data of the entry
|
363
357
|
end</pre>
|
364
|
-
<p>An example that can take any input, filter using a regular expression
|
358
|
+
<p>An example that can take any input, filter using a regular expression and output
|
365
359
|
to a FASTA file can be found in sample/any2fasta.rb. With this technique it is
|
366
360
|
possible to write a Unix type grep/sort pipe for sequence information. One
|
367
361
|
example using scripts in the BIORUBY sample folder:</p>
|
368
362
|
<pre>fastagrep.rb '/At|Dm/' database.seq | fastasort.rb</pre>
|
369
|
-
<p>greps the database for Arabidopsis and Drosophila entries and sorts the output
|
370
|
-
to FASTA.</p>
|
363
|
+
<p>greps the database for Arabidopsis and Drosophila entries and sorts the output to FASTA.</p>
|
371
364
|
<p>Other methods to extract specific data from database objects can be
|
372
365
|
different between databases, though some methods are common (see the
|
373
|
-
guidelines for common methods
|
366
|
+
guidelines for common methods in bio/db.rb).</p>
|
374
367
|
<ul>
|
375
368
|
<li>entry_id --> gets ID of the entry</li>
|
376
369
|
<li>definition --> gets definition of the entry</li>
|
@@ -380,14 +373,13 @@ guidelines for common methods as described in bio/db.rb).</p>
|
|
380
373
|
</ul>
|
381
374
|
<p>Refer to the documents of each database to find the exact naming
|
382
375
|
of the included methods.</p>
|
383
|
-
<p>In
|
384
|
-
name is plural the method returns some object as an Array. For
|
376
|
+
<p>In general, BioRuby uses the following conventions: when a method
|
377
|
+
name is plural, the method returns some object as an Array. For
|
385
378
|
example, some classes have a "references" method which returns
|
386
379
|
multiple Bio::Reference objects as an Array. And some classes have a
|
387
380
|
"reference" method which returns a single Bio::Reference object.</p>
|
388
381
|
<h3><a name="label-6" id="label-6">Alignments (Bio::Alignment)</a></h3><!-- RDLabel: "Alignments (Bio::Alignment)" -->
|
389
|
-
<p>Bio::Alignment class in bio/alignment.rb is a container class like Ruby's Hash
|
390
|
-
Array and BioPerl's Bio::SimpleAlign. A very simple example is:</p>
|
382
|
+
<p>The Bio::Alignment class in bio/alignment.rb is a container class like Ruby's Hash and Array classes and BioPerl's Bio::SimpleAlign. A very simple example is:</p>
|
391
383
|
<pre>bioruby> seqs = [ 'atgca', 'aagca', 'acgca', 'acgcg' ]
|
392
384
|
bioruby> seqs = seqs.collect{ |x| Bio::Sequence::NA.new(x) }
|
393
385
|
# creates alignment object
|
@@ -417,15 +409,32 @@ a.each_site { |x| p x }
|
|
417
409
|
# clustalw command must be installed.
|
418
410
|
factory = Bio::ClustalW.new
|
419
411
|
a2 = a.do_align(factory)</pre>
|
412
|
+
<p>Read a ClustalW or Muscle 'ALN' alignment file:</p>
|
413
|
+
<pre>bioruby> aln = Bio::ClustalW::Report.new(File.read('../test/data/clustalw/example1.aln'))
|
414
|
+
bioruby> aln.header
|
415
|
+
==> "CLUSTAL 2.0.9 multiple sequence alignment"</pre>
|
416
|
+
<p>Fetch a sequence:</p>
|
417
|
+
<pre>bioruby> seq = aln.get_sequence(1)
|
418
|
+
bioruby> seq.definition
|
419
|
+
==> "gi|115023|sp|P10425|"</pre>
|
420
|
+
<p>Get a partial sequence:</p>
|
421
|
+
<pre>bioruby> seq.to_s[60..120]
|
422
|
+
==> "LGYFNG-EAVPSNGLVLNTSKGLVLVDSSWDNKLTKELIEMVEKKFQKRVTDVIITHAHAD"</pre>
|
423
|
+
<p>Show the full alignment residue match information for the sequences in the set:</p>
|
424
|
+
<pre>bioruby> aln.match_line[60..120]
|
425
|
+
==> " . **. . .. ::*: . * : : . .: .* * *"</pre>
|
426
|
+
<p>Return a Bio::Alignment object:</p>
|
427
|
+
<pre>bioruby> aln.alignment.consensus[60..120]
|
428
|
+
==> "???????????SN?????????????D??????????L??????????????????H?H?D"</pre>
|
420
429
|
<h2><a name="label-7" id="label-7">Restriction Enzymes (Bio::RE)</a></h2><!-- RDLabel: "Restriction Enzymes (Bio::RE)" -->
|
421
430
|
<p>BioRuby has extensive support for restriction enzymes (REs). It contains a full
|
422
431
|
library of commonly used REs (from REBASE) which can be used to cut single
|
423
|
-
stranded RNA or
|
432
|
+
stranded RNA or double stranded DNA into fragments. To list all enzymes:</p>
|
424
433
|
<pre>rebase = Bio::RestrictionEnzyme.rebase
|
425
434
|
rebase.each do |enzyme_name, info|
|
426
435
|
p enzyme_name
|
427
436
|
end</pre>
|
428
|
-
<p>and cut a sequence with an enzyme follow up with:</p>
|
437
|
+
<p>and to cut a sequence with an enzyme follow up with:</p>
|
429
438
|
<pre>res = seq.cut_with_enzyme('EcoRII', {:max_permutations => 0},
|
430
439
|
{:view_ranges => true})
|
431
440
|
if res.kind_of? Symbol #error
|
@@ -451,12 +460,12 @@ res.each do |frag|
|
|
451
460
|
<p>Let's start with a query.pep file which contains a sequence in FASTA
|
452
461
|
format. In this example we are going to execute a homology search
|
453
462
|
from a remote internet site or on your local machine. Note that you
|
454
|
-
can use the ssearch program instead of fasta when you use
|
463
|
+
can use the ssearch program instead of fasta when you use it in your
|
455
464
|
local machine.</p>
|
456
465
|
<h3><a name="label-9" id="label-9">using FASTA in local machine</a></h3><!-- RDLabel: "using FASTA in local machine" -->
|
457
466
|
<p>Install the fasta program on your machine (the command name looks like
|
458
|
-
fasta34. FASTA can be downloaded from ftp://ftp.virginia.edu/pub/fasta/)
|
459
|
-
First, you must prepare your FASTA-formatted database sequence file
|
467
|
+
fasta34. FASTA can be downloaded from ftp://ftp.virginia.edu/pub/fasta/).</p>
|
468
|
+
<p>First, you must prepare your FASTA-formatted database sequence file
|
460
469
|
target.pep and FASTA-formatted query.pep. </p>
|
461
470
|
<pre>#!/usr/bin/env ruby
|
462
471
|
|
@@ -489,19 +498,18 @@ ff.each do |entry|
|
|
489
498
|
end
|
490
499
|
end
|
491
500
|
end</pre>
|
492
|
-
<p>We named above script
|
501
|
+
<p>We named above script f_search.rb. You can execute it as follows:</p>
|
493
502
|
<pre>% ./f_search.rb query.pep target.pep > f_search.out</pre>
|
494
503
|
<p>In above script, the variable "factory" is a factory object for executing
|
495
504
|
FASTA many times easily. Instead of using Fasta#query method,
|
496
505
|
Bio::Sequence#fasta method can be used.</p>
|
497
506
|
<pre>seq = ">test seq\nYQVLEEIGRGSFGSVRKVIHIPTKKLLVRKDIKYGHMNSKE"
|
498
507
|
seq.fasta(factory)</pre>
|
499
|
-
<p>When you want to add options to FASTA
|
500
|
-
third argument of Bio::Fasta.local method. For example,
|
501
|
-
and getting top-10 hits:</p>
|
508
|
+
<p>When you want to add options to FASTA commands, you can set the
|
509
|
+
third argument of the Bio::Fasta.local method. For example, the following sets ktup to 1 and gets a list of the top 10 hits:</p>
|
502
510
|
<pre>factory = Bio::Fasta.local('fasta34', 'target.pep', '-b 10')
|
503
511
|
factory.ktup = 1</pre>
|
504
|
-
<p>Bio::Fasta#query returns Bio::Fasta::Report object.
|
512
|
+
<p>Bio::Fasta#query returns a Bio::Fasta::Report object.
|
505
513
|
We can get almost all information described in FASTA report text
|
506
514
|
with the Report object. For example, getting information for hits:</p>
|
507
515
|
<pre>report.each do |hit|
|
@@ -527,11 +535,10 @@ with the Report object. For example, getting information for hits:</p>
|
|
527
535
|
# in hit(target) sequence
|
528
536
|
puts hit.lap_at # array of above four numbers
|
529
537
|
end</pre>
|
530
|
-
<p>Most of above methods are common
|
531
|
-
below. Please refer to
|
538
|
+
<p>Most of above methods are common to the Bio::Blast::Report described
|
539
|
+
below. Please refer to the documentation of the Bio::Fasta::Report class for
|
532
540
|
FASTA-specific details.</p>
|
533
|
-
<p>If you need original output text of FASTA program you can use the "output"
|
534
|
-
method of the factory object after the "query" method.</p>
|
541
|
+
<p>If you need the original output text of FASTA program you can use the "output" method of the factory object after the "query" method.</p>
|
535
542
|
<pre>report = factory.query(entry)
|
536
543
|
puts factory.output</pre>
|
537
544
|
<h3><a name="label-10" id="label-10">using FASTA from a remote internet site</a></h3><!-- RDLabel: "using FASTA from a remote internet site" -->
|
@@ -558,7 +565,7 @@ same things as with a local method.</p>
|
|
558
565
|
<p>Select the databases you require. Next, give the search program from
|
559
566
|
the type of query sequence and database.</p>
|
560
567
|
<ul>
|
561
|
-
<li>When query is
|
568
|
+
<li>When query is an amino acid sequence
|
562
569
|
<ul>
|
563
570
|
<li>When protein database, program is "fasta".</li>
|
564
571
|
<li>When nucleic database, program is "tfasta".</li>
|
@@ -566,10 +573,10 @@ the type of query sequence and database.</p>
|
|
566
573
|
<li>When query is a nucleic acid sequence
|
567
574
|
<ul>
|
568
575
|
<li>When nucleic database, program is "fasta".</li>
|
569
|
-
<li>(When protein database,
|
576
|
+
<li>(When protein database, the search would fail.)</li>
|
570
577
|
</ul></li>
|
571
578
|
</ul>
|
572
|
-
<p>For example:</p>
|
579
|
+
<p>For example, run:</p>
|
573
580
|
<pre>program = 'fasta'
|
574
581
|
database = 'genes'
|
575
582
|
|
@@ -600,7 +607,7 @@ The parameter "program" is different from FASTA - as you can expect:</p>
|
|
600
607
|
<p>Bio::BLAST uses "-m 7" XML output of BLAST by default when either
|
601
608
|
XMLParser or REXML (both of them are XML parser libraries for Ruby -
|
602
609
|
of the two XMLParser is the fastest) is installed on your computer. In
|
603
|
-
Ruby version 1.8.0
|
610
|
+
Ruby version 1.8.0 or later, REXML is bundled with Ruby's
|
604
611
|
distribution.</p>
|
605
612
|
<p>When no XML parser library is present, Bio::BLAST uses "-m 8" tabular
|
606
613
|
deliminated format. Available information is limited with the
|
@@ -631,9 +638,9 @@ midline.</p>
|
|
631
638
|
puts hit.lap_at
|
632
639
|
end</pre>
|
633
640
|
<p>For simplicity and API compatibility, some information such as score
|
634
|
-
|
641
|
+
is extracted from the first Hsp (High-scoring Segment Pair).</p>
|
635
642
|
<p>Check the documentation for Bio::Blast::Report to see what can be
|
636
|
-
retrieved. For now suffice to
|
643
|
+
retrieved. For now suffice to say that Bio::Blast::Report has a
|
637
644
|
hierarchical structure mirroring the general BLAST output stream:</p>
|
638
645
|
<ul>
|
639
646
|
<li>In a Bio::Blast::Report object, @iterations is an array of
|
@@ -699,11 +706,10 @@ want to add other sites, you must write the following:</p>
|
|
699
706
|
named "exec_MYSITE" to get query sequence and to pass the result to
|
700
707
|
Bio::Blast::Report.new(or Bio::Blast::Default::Report.new):</p>
|
701
708
|
<pre>factory = Bio::Blast.remote(program, db, option, 'MYSITE')</pre>
|
702
|
-
<p>When you write above routines, please send to the BioRuby project and
|
703
|
-
they may be included.</p>
|
709
|
+
<p>When you write above routines, please send them to the BioRuby project, and they may be included in future releases.</p>
|
704
710
|
<h2><a name="label-14" id="label-14">Generate a reference list using PubMed (Bio::PubMed)</a></h2><!-- RDLabel: "Generate a reference list using PubMed (Bio::PubMed)" -->
|
705
711
|
<p>Nowadays using NCBI E-Utils is recommended. Use Bio::PubMed.esearch
|
706
|
-
and Bio::PubMed.efetch
|
712
|
+
and Bio::PubMed.efetch.</p>
|
707
713
|
<pre>#!/usr/bin/env ruby
|
708
714
|
|
709
715
|
require 'bio'
|
@@ -741,7 +747,7 @@ BibTeX format bibliography data to a file named genoinfo.bib.</p>
|
|
741
747
|
% ./pmsearch.rb genome bioinformatics >> genoinfo.bib</pre>
|
742
748
|
<p>The BibTeX can be used with Tex or LaTeX to form bibliography
|
743
749
|
information with your journal article. For more information
|
744
|
-
on BibTex see
|
750
|
+
on using BibTex see <a href="http://www.bibtex.org/Using/">BibTex HowTo site</a>. A quick example:</p>
|
745
751
|
<p>Save this to hoge.tex:</p>
|
746
752
|
<pre>\documentclass{jarticle}
|
747
753
|
\begin{document}
|
@@ -754,12 +760,11 @@ foo bar KEGG database~\cite{PMID:10592173} baz hoge fuga.
|
|
754
760
|
% bibtex hoge # processes genoinfo.bib
|
755
761
|
% latex hoge # creates bibliography list
|
756
762
|
% latex hoge # inserts correct bibliography reference</pre>
|
757
|
-
<p>Now, you get hoge.dvi and hoge.ps - the latter
|
758
|
-
Postscript viewer.</p>
|
763
|
+
<p>Now, you get hoge.dvi and hoge.ps - the latter of which can be viewed with any Postscript viewer.</p>
|
759
764
|
<h3><a name="label-16" id="label-16">Bio::Reference#bibitem</a></h3><!-- RDLabel: "Bio::Reference#bibitem" -->
|
760
765
|
<p>When you don't want to create a bib file, you can use
|
761
766
|
Bio::Reference#bibitem method instead of Bio::Reference#bibtex.
|
762
|
-
In above pmfetch.rb and pmsearch.rb scripts, change</p>
|
767
|
+
In the above pmfetch.rb and pmsearch.rb scripts, change</p>
|
763
768
|
<pre>puts reference.bibtex</pre>
|
764
769
|
<p>to</p>
|
765
770
|
<pre>puts reference.bibitem</pre>
|
@@ -801,12 +806,12 @@ BioRuby and other projects' members (2002).</p>
|
|
801
806
|
</ul></li>
|
802
807
|
<li>BioSQL
|
803
808
|
<ul>
|
804
|
-
<li>Schemas to store sequence data to relational
|
809
|
+
<li>Schemas to store sequence data to relational databases such as
|
805
810
|
MySQL and PostgreSQL, and methods to retrieve entries from the database.</li>
|
806
811
|
</ul></li>
|
807
812
|
</ul>
|
808
|
-
<p>
|
809
|
-
<a href="http://obda.open-bio.org
|
813
|
+
<p>This tutorial only gives a quick overview of OBDA. Check out
|
814
|
+
<a href="http://obda.open-bio.org">the OBDA site</a> for more extensive details.</p>
|
810
815
|
<h2><a name="label-18" id="label-18">BioRegistry</a></h2><!-- RDLabel: "BioRegistry" -->
|
811
816
|
<p>BioRegistry allows for locating retrieval methods and database
|
812
817
|
locations through configuration files. The priorities are</p>
|
@@ -821,14 +826,14 @@ when all local configulation files are not available.</p>
|
|
821
826
|
<p>In the current BioRuby implementation all local configulation files
|
822
827
|
are read. For databases with the same name settings encountered first
|
823
828
|
are used. This means that if you don't like some settings of a
|
824
|
-
database in system global configuration file
|
825
|
-
(/etc/bioinformatics/seqdatabase.ini), you can easily override
|
829
|
+
database in the system's global configuration file
|
830
|
+
(/etc/bioinformatics/seqdatabase.ini), you can easily override them by
|
826
831
|
writing settings to ~/.bioinformatics/seqdatabase.ini.</p>
|
827
832
|
<p>The syntax of the configuration file is called a stanza format. For example</p>
|
828
833
|
<pre>[DatabaseName]
|
829
834
|
protocol=ProtocolName
|
830
|
-
location=
|
831
|
-
<p>You can write a description like above entry for every database.</p>
|
835
|
+
location=ServerName</pre>
|
836
|
+
<p>You can write a description like the above entry for every database.</p>
|
832
837
|
<p>The database name is a local label for yourself, so you can name it
|
833
838
|
freely and it can differ from the name of the actual databases. In the
|
834
839
|
actual specification of BioRegistry where there are two or more
|
@@ -836,8 +841,8 @@ settings for a database of the same name, it is proposed that
|
|
836
841
|
connection to the database is tried sequentially with the order
|
837
842
|
written in configuration files. However, this has not (yet) been
|
838
843
|
implemented in BioRuby.</p>
|
839
|
-
<p>In addition, for some
|
840
|
-
other than locations (e.g. user name
|
844
|
+
<p>In addition, for some protocols, you must set additional options
|
845
|
+
other than locations (e.g. user name for MySQL). In the BioRegistory
|
841
846
|
specification, current available protocols are:</p>
|
842
847
|
<ul>
|
843
848
|
<li>index-flat</li>
|
@@ -850,8 +855,7 @@ specification, current available protocols are:</p>
|
|
850
855
|
<p>In BioRuby, you can use index-flat, index-berkleydb, biofetch and biosql.
|
851
856
|
Note that the BioRegistry specification sometimes gets updated and BioRuby
|
852
857
|
does not always follow quickly.</p>
|
853
|
-
<p>Here an example.
|
854
|
-
files:</p>
|
858
|
+
<p>Here is an example. It creates a Bio::Registry object and reads the configuration files:</p>
|
855
859
|
<pre>reg = Bio::Registry.new
|
856
860
|
|
857
861
|
# connects to the database "genbank"
|
@@ -859,42 +863,39 @@ serv = reg.get_database('genbank')
|
|
859
863
|
|
860
864
|
# gets entry of the ID
|
861
865
|
entry = serv.get_by_id('AA2CG')</pre>
|
862
|
-
<p>The variable "serv" is a server object corresponding to the
|
863
|
-
written in configuration files. The class of the object is one of
|
866
|
+
<p>The variable "serv" is a server object corresponding to the settings
|
867
|
+
written in the configuration files. The class of the object is one of
|
864
868
|
Bio::SQL, Bio::Fetch, and so on. Note that Bio::Registry#get_database("name")
|
865
869
|
returns nil if no database is found.</p>
|
866
|
-
<p>After that, you can use get_by_id method and some specific methods.
|
867
|
-
Please refer to below
|
870
|
+
<p>After that, you can use the get_by_id method and some specific methods.
|
871
|
+
Please refer to the sections below for more information.</p>
|
868
872
|
<h2><a name="label-19" id="label-19">BioFlat</a></h2><!-- RDLabel: "BioFlat" -->
|
869
873
|
<p>BioFlat is a mechanism to create index files of flat files and to retrieve
|
870
874
|
these entries fast. There are two index types. index-flat is a simple index
|
871
|
-
performing binary search without using
|
875
|
+
performing binary search without using any external libraries of Ruby. index-berkeleydb
|
872
876
|
uses Berkeley DB for indexing - but requires installing bdb on your computer,
|
873
|
-
as well as the BDB Ruby package.
|
874
|
-
br_bioflat.rb command bundled with BioRuby.</p>
|
877
|
+
as well as the BDB Ruby package. To create the index itself, you can use br_bioflat.rb command bundled with BioRuby.</p>
|
875
878
|
<pre>% br_bioflat.rb --makeindex database_name [--format data_format] filename...</pre>
|
876
879
|
<p>The format can be omitted because BioRuby has autodetection. If that
|
877
|
-
|
878
|
-
database class.</p>
|
880
|
+
doesn't work, you can try specifying the data format as the name of a BioRuby database class.</p>
|
879
881
|
<p>Search and retrieve data from database:</p>
|
880
882
|
<pre>% br_bioflat.rb database_name identifier</pre>
|
881
|
-
<p>For example, to create index of GenBank files gbbct*.seq and get entry
|
882
|
-
from the database:</p>
|
883
|
+
<p>For example, to create an index of GenBank files gbbct*.seq and get the entry from the database:</p>
|
883
884
|
<pre>% br_bioflat.rb --makeindex my_bctdb --format GenBank gbbct*.seq
|
884
885
|
% br_bioflat.rb my_bctdb A16STM262</pre>
|
885
886
|
<p>If you have Berkeley DB on your system and installed the bdb extension
|
886
|
-
module of Ruby (see http://raa.ruby-lang.org/project/bdb/), you can
|
887
|
+
module of Ruby (see <a href="http://raa.ruby-lang.org/project/bdb/">the BDB project page</a> ), you can
|
887
888
|
create and search indexes with Berkeley DB - a very fast alternative
|
888
889
|
that uses little computer memory. When creating the index, use the
|
889
890
|
"--makeindex-bdb" option instead of "--makeindex".</p>
|
890
891
|
<pre>% br_bioflat.rb --makeindex-bdb database_name [--format data_format] filename...</pre>
|
891
892
|
<h2><a name="label-20" id="label-20">BioFetch</a></h2><!-- RDLabel: "BioFetch" -->
|
892
893
|
<pre>Note: this section is an advanced topic</pre>
|
893
|
-
<p>BioFetch is a database retrieval mechanism via CGI.
|
894
|
-
options and error codes are standardized.
|
894
|
+
<p>BioFetch is a database retrieval mechanism via CGI. CGI Parameters,
|
895
|
+
options and error codes are standardized. Client access via
|
895
896
|
http is possible giving the database name, identifiers and format to
|
896
897
|
retrieve entries.</p>
|
897
|
-
<p>The BioRuby project has a BioFetch server
|
898
|
+
<p>The BioRuby project has a BioFetch server at bioruby.org. It uses
|
898
899
|
GenomeNet's DBGET system as a backend. The source code of the
|
899
900
|
server is in sample/ directory. Currently, there are only two
|
900
901
|
BioFetch servers in the world: bioruby.org and EBI.</p>
|
@@ -912,18 +913,18 @@ entry = serv.fetch(db_name, entry_id)</pre></li>
|
|
912
913
|
serv = reg.get_database('genbank')
|
913
914
|
entry = serv.get_by_id('AA2CG')</pre></li>
|
914
915
|
</ol>
|
915
|
-
<p>If you want to use (4), you
|
916
|
-
in seqdatabase.ini.
|
916
|
+
<p>If you want to use (4), you have to include some settings
|
917
|
+
in seqdatabase.ini. For example:</p>
|
917
918
|
<pre>[genbank]
|
918
919
|
protocol=biofetch
|
919
920
|
location=http://bioruby.org/cgi-bin/biofetch.rb
|
920
921
|
biodbname=genbank</pre>
|
921
922
|
<h3><a name="label-21" id="label-21">The combination of BioFetch, Bio::KEGG::GENES and Bio::AAindex1</a></h3><!-- RDLabel: "The combination of BioFetch, Bio::KEGG::GENES and Bio::AAindex1" -->
|
922
|
-
<p>Bioinformatics is often about
|
923
|
-
example
|
924
|
-
Halobacterium from KEGG GENES database and
|
923
|
+
<p>Bioinformatics is often about gluing things together. Here is an
|
924
|
+
example that gets the bacteriorhodopsin gene (VNG1467G) of the archaea
|
925
|
+
Halobacterium from KEGG GENES database and gets alpha-helix index
|
925
926
|
data (BURA740101) from the AAindex (Amino acid indices and similarity
|
926
|
-
matrices) database, and
|
927
|
+
matrices) database, and shows the helix score for each 15-aa length
|
927
928
|
overlapping window.</p>
|
928
929
|
<pre>#!/usr/bin/env ruby
|
929
930
|
|
@@ -943,14 +944,14 @@ aaseq.window_search(win_size) do |subseq|
|
|
943
944
|
puts [ position, score ].join("\t")
|
944
945
|
position += 1
|
945
946
|
end</pre>
|
946
|
-
<p>The special method Bio::Fetch.query uses preset BioFetch server
|
947
|
-
|
947
|
+
<p>The special method Bio::Fetch.query uses the preset BioFetch server
|
948
|
+
at bioruby.org. (The server internally gets data from GenomeNet.
|
948
949
|
Because the KEGG/GENES database and AAindex database are not available
|
949
|
-
from other BioFetch servers, we used bioruby.org server with
|
950
|
+
from other BioFetch servers, we used the bioruby.org server with
|
950
951
|
Bio::Fetch.query method.)</p>
|
951
952
|
<h2><a name="label-22" id="label-22">BioSQL</a></h2><!-- RDLabel: "BioSQL" -->
|
952
|
-
<p>BioSQL is a well known schema to store and retrive biological sequences using a RDBMS like PostgreSQL or MySQL
|
953
|
-
First of all, you must install a database engine or have access to a remote one. Then create the schema and populate with the taxonomy. You can follow the <a href="http://code.open-bio.org/svnweb/index.cgi/biosql/view/biosql-schema/trunk/INSTALL">Official Guide</a> .
|
953
|
+
<p>BioSQL is a well known schema to store and retrive biological sequences using a RDBMS like PostgreSQL or MySQL: note that SQLite is not supported.
|
954
|
+
First of all, you must install a database engine or have access to a remote one. Then create the schema and populate with the taxonomy. You can follow the <a href="http://code.open-bio.org/svnweb/index.cgi/biosql/view/biosql-schema/trunk/INSTALL">Official Guide</a> to accomplish these steps.
|
954
955
|
Next step is to install these gems:</p>
|
955
956
|
<ul>
|
956
957
|
<li>ActiveRecord</li>
|
@@ -958,21 +959,22 @@ Next step is to install these gems:</p>
|
|
958
959
|
<li>The layer to comunicate with you preferred RDBMS (postgresql, mysql, jdbcmysql in case you are running JRuby )</li>
|
959
960
|
</ul>
|
960
961
|
<p>You can find ActiveRecord's models in /bioruby/lib/bio/io/biosql</p>
|
961
|
-
<p>When you have your database up and running, you can connect to it
|
962
|
+
<p>When you have your database up and running, you can connect to it like this:</p>
|
962
963
|
<pre>#!/usr/bin/env ruby
|
963
964
|
|
964
965
|
require 'bio'
|
965
966
|
|
966
967
|
connection = Bio::SQL.establish_connection({'development'=>{'hostname'=>"YourHostname",
|
967
|
-
|
968
|
-
|
969
|
-
|
970
|
-
|
971
|
-
|
972
|
-
|
973
|
-
|
968
|
+
'database'=>"CoolBioSeqDB",
|
969
|
+
'adapter'=>"jdbcmysql",
|
970
|
+
'username'=>"YourUser",
|
971
|
+
'password'=>"YouPassword"
|
972
|
+
}
|
973
|
+
},
|
974
|
+
'development')
|
974
975
|
|
975
|
-
#The first parameter is the hash contaning the description of the configuration similar to database.yml in Rails
|
976
|
+
#The first parameter is the hash contaning the description of the configuration; similar to database.yml in Rails applications, you can declare different environment.
|
977
|
+
#The second parameter is the environment to use: 'development', 'test', or 'production'.
|
976
978
|
|
977
979
|
#To store a sequence into the database you simply need a biosequence object.
|
978
980
|
biosql_database = Bio::SQL::Biodatabase.find(:first)
|
@@ -991,27 +993,28 @@ Bio::SQL.list_databases
|
|
991
993
|
#retriving a generic accession
|
992
994
|
bioseq = Bio::SQL.fetch_accession("YouAccession")
|
993
995
|
|
994
|
-
#If you use biosequence objects, you will find all its method mapped to BioSQL sequences.
|
996
|
+
#If you use biosequence objects, you will find all its method mapped to BioSQL sequences.
|
997
|
+
#But you can also access to the models directly:
|
995
998
|
|
996
|
-
#get the raw sequence associated with
|
999
|
+
#get the raw sequence associated with your accession
|
997
1000
|
bioseq.entry.biosequence
|
998
1001
|
|
999
|
-
#get the length of your sequence
|
1002
|
+
#get the length of your sequence; this is the explicit form of bioseq.length
|
1000
1003
|
bioseq.entry.biosequence.length
|
1001
1004
|
|
1002
|
-
#convert the sequence
|
1005
|
+
#convert the sequence into GenBank format
|
1003
1006
|
bioseq.to_biosequence.output(:genbank)</pre>
|
1004
|
-
<p>BioSQL' <a href="http://www.biosql.org/wiki/Schema_Overview">schema</a> is not
|
1007
|
+
<p>BioSQL's <a href="http://www.biosql.org/wiki/Schema_Overview">schema</a> is not very intuitive for beginners, so spend some time on understanding it. In the end if you know a little bit of Ruby on Rails, everything will go smoothly. You can find information on Annotation <a href="http://www.biosql.org/wiki/Annotation_Mapping">here</a>.
|
1005
1008
|
ToDo: add exemaples from George. I remember he did some cool post on BioSQL and Rails.</p>
|
1006
1009
|
<h1><a name="label-23" id="label-23">PhyloXML</a></h1><!-- RDLabel: "PhyloXML" -->
|
1007
1010
|
<p>PhyloXML is an XML language for saving, analyzing and exchanging data of
|
1008
|
-
annotated phylogenetic trees. PhyloXML parser in BioRuby is implemented in
|
1009
|
-
Bio::PhyloXML::Parser and writer in Bio::PhyloXML::Writer.
|
1010
|
-
More information at www.phyloxml.org</p>
|
1011
|
+
annotated phylogenetic trees. PhyloXML's parser in BioRuby is implemented in
|
1012
|
+
Bio::PhyloXML::Parser, and its writer in Bio::PhyloXML::Writer.
|
1013
|
+
More information can be found at <a href="http://www.phyloxml.org">www.phyloxml.org</a>.</p>
|
1011
1014
|
<h2><a name="label-24" id="label-24">Requirements</a></h2><!-- RDLabel: "Requirements" -->
|
1012
|
-
<p>In addition to BioRuby
|
1015
|
+
<p>In addition to BioRuby, you need the libxml Ruby bindings. To install, execute:</p>
|
1013
1016
|
<pre>% gem install -r libxml-ruby</pre>
|
1014
|
-
<p>For more information see <a href="http://libxml.rubyforge.org/install.xml"
|
1017
|
+
<p>For more information see the <a href="http://libxml.rubyforge.org/install.xml">libxml installer page</a></p>
|
1015
1018
|
<h2><a name="label-25" id="label-25">Parsing a file</a></h2><!-- RDLabel: "Parsing a file" -->
|
1016
1019
|
<pre>require 'bio'
|
1017
1020
|
|
@@ -1022,9 +1025,9 @@ phyloxml = Bio::PhyloXML::Parser.open('example.xml')
|
|
1022
1025
|
phyloxml.each do |tree|
|
1023
1026
|
puts tree.name
|
1024
1027
|
end</pre>
|
1025
|
-
<p>If there are several trees in the file, you can access the one you wish by
|
1028
|
+
<p>If there are several trees in the file, you can access the one you wish by specifying its index:</p>
|
1026
1029
|
<pre>tree = phyloxml[3]</pre>
|
1027
|
-
<p>You can use all Bio::Tree methods on the tree, since PhyloXML::Tree inherits from Bio::Tree. For example
|
1030
|
+
<p>You can use all Bio::Tree methods on the tree, since PhyloXML::Tree inherits from Bio::Tree. For example, </p>
|
1028
1031
|
<pre>tree.leaves.each do |node|
|
1029
1032
|
puts node.name
|
1030
1033
|
end</pre>
|
@@ -1045,7 +1048,7 @@ writer.write(tree1)
|
|
1045
1048
|
# Add another tree to the file
|
1046
1049
|
writer.write(tree2)</pre>
|
1047
1050
|
<h2><a name="label-27" id="label-27">Retrieving data</a></h2><!-- RDLabel: "Retrieving data" -->
|
1048
|
-
<p>Here is an example of how to retrieve the scientific name of the clades.</p>
|
1051
|
+
<p>Here is an example of how to retrieve the scientific name of the clades included in each tree.</p>
|
1049
1052
|
<pre>require 'bio'
|
1050
1053
|
|
1051
1054
|
phyloxml = Bio::PhyloXML::Parser.open('ncbi_taxonomy_mollusca.xml')
|
@@ -1086,7 +1089,7 @@ end
|
|
1086
1089
|
#aggtcgcggcctgtggaagtcctctcct
|
1087
1090
|
#taaatcgc--cccgtgg-agtccc-cct</pre>
|
1088
1091
|
<h2><a name="label-29" id="label-29">The BioRuby example programs</a></h2><!-- RDLabel: "The BioRuby example programs" -->
|
1089
|
-
<p>Some sample programs are stored in ./samples/ directory.
|
1092
|
+
<p>Some sample programs are stored in ./samples/ directory. For example, the n2aa.rb program (transforms a nucleic acid sequence into an amino acid sequence) can be run using:</p>
|
1090
1093
|
<pre>./sample/na2aa.rb test/data/fasta/example1.txt </pre>
|
1091
1094
|
<h2><a name="label-30" id="label-30">Unit testing and doctests</a></h2><!-- RDLabel: "Unit testing and doctests" -->
|
1092
1095
|
<p>BioRuby comes with an extensive testing framework with over 1300 tests and 2700
|
@@ -1098,23 +1101,23 @@ in this tutorial to doctest - more info upcoming.</p>
|
|
1098
1101
|
<h2><a name="label-31" id="label-31">Further reading</a></h2><!-- RDLabel: "Further reading" -->
|
1099
1102
|
<p>See the BioRuby in anger Wiki. A lot of BioRuby's documentation exists in the
|
1100
1103
|
source code and unit tests. To really dive in you will need the latest source
|
1101
|
-
code tree. The embedded rdoc documentation can be viewed online at
|
1104
|
+
code tree. The embedded rdoc documentation for the BioRuby source code can be viewed online at
|
1102
1105
|
<a href="http://bioruby.org/rdoc/"><URL:http://bioruby.org/rdoc/></a>.</p>
|
1103
1106
|
<h2><a name="label-32" id="label-32">BioRuby Shell</a></h2><!-- RDLabel: "BioRuby Shell" -->
|
1104
|
-
<p>The BioRuby shell implementation
|
1107
|
+
<p>The BioRuby shell implementation is located in ./lib/bio/shell. It is very interesting
|
1105
1108
|
as it uses IRB (the Ruby intepreter) which is a powerful environment described in
|
1106
|
-
<a href="http://ruby-doc.org/docs/ProgrammingRuby/html/irb.html">Programming Ruby's
|
1109
|
+
<a href="http://ruby-doc.org/docs/ProgrammingRuby/html/irb.html">Programming Ruby's IRB chapter</a>. IRB commands can be typed directly into the shell, e.g.</p>
|
1107
1110
|
<pre>bioruby!> IRB.conf[:PROMPT_MODE]
|
1108
1111
|
==!> :PROMPT_C</pre>
|
1109
|
-
<p>
|
1112
|
+
<p>Additionally, you also may want to install the optional Ruby readline support -
|
1110
1113
|
with Debian libreadline-ruby. To edit a previous line you may have to press
|
1111
|
-
line down (arrow
|
1114
|
+
line down (down arrow) first.</p>
|
1112
1115
|
<h1><a name="label-33" id="label-33">Helpful tools</a></h1><!-- RDLabel: "Helpful tools" -->
|
1113
1116
|
<p>Apart from rdoc you may also want to use rtags - which allows jumping around
|
1114
1117
|
source code by clicking on class and method names. </p>
|
1115
1118
|
<pre>cd bioruby/lib
|
1116
1119
|
rtags -R --vi</pre>
|
1117
|
-
<p>For a tutorial see <a href="http://rtags.rubyforge.org/"
|
1120
|
+
<p>For a tutorial see <a href="http://rtags.rubyforge.org/">here</a></p>
|
1118
1121
|
<h1><a name="label-34" id="label-34">APPENDIX</a></h1><!-- RDLabel: "APPENDIX" -->
|
1119
1122
|
<h2><a name="label-35" id="label-35">KEGG API</a></h2><!-- RDLabel: "KEGG API" -->
|
1120
1123
|
<p>Please refer to KEGG_API.rd.ja (English version: <a href="http://www.genome.jp/kegg/soap/doc/keggapi_manual.html"><URL:http://www.genome.jp/kegg/soap/doc/keggapi_manual.html></a> ) and</p>
|
@@ -1122,9 +1125,9 @@ rtags -R --vi</pre>
|
|
1122
1125
|
<li><a href="http://www.genome.jp/kegg/soap/"><URL:http://www.genome.jp/kegg/soap/></a></li>
|
1123
1126
|
</ul>
|
1124
1127
|
<h2><a name="label-36" id="label-36">Ruby Ensembl API</a></h2><!-- RDLabel: "Ruby Ensembl API" -->
|
1125
|
-
<p>Ruby Ensembl API is a
|
1128
|
+
<p>The Ruby Ensembl API is a Ruby API to the Ensembl database. It is NOT currently
|
1126
1129
|
included in the BioRuby archives. To install it, see
|
1127
|
-
<a href="http://wiki.github.com/jandot/ruby-ensembl-api"
|
1130
|
+
<a href="http://wiki.github.com/jandot/ruby-ensembl-api">the Ruby-Ensembl Github</a>
|
1128
1131
|
for more information.</p>
|
1129
1132
|
<h3><a name="label-37" id="label-37">Gene Ontology (GO) through the Ruby Ensembl API</a></h3><!-- RDLabel: "Gene Ontology (GO) through the Ruby Ensembl API" -->
|
1130
1133
|
<p>Gene Ontologies can be fetched through the Ruby Ensembl API package:</p>
|
@@ -1134,7 +1137,7 @@ infile = IO.readlines(ARGV.shift) # reading your comma-separated accession mappi
|
|
1134
1137
|
infile.each do |line|
|
1135
1138
|
accs = line.split(",") # Split the comma-sep.entries into an array
|
1136
1139
|
drosphila_acc = accs.shift # the first entry is the Drosophila acc
|
1137
|
-
mosq_acc = accs.shift # the second entry is
|
1140
|
+
mosq_acc = accs.shift # the second entry is your Mosq. acc
|
1138
1141
|
gene = Ensembl::Core::Gene.find_by_stable_id(drosophila_acc)
|
1139
1142
|
print "#{mosq_acc}"
|
1140
1143
|
gene.go_terms.each do |go|
|
@@ -1145,9 +1148,9 @@ end</pre>
|
|
1145
1148
|
homologues.</p>
|
1146
1149
|
<h2><a name="label-38" id="label-38">Using BioPerl or BioPython from Ruby</a></h2><!-- RDLabel: "Using BioPerl or BioPython from Ruby" -->
|
1147
1150
|
<p>At the moment there is no easy way of accessing BioPerl from Ruby. The best way, perhaps, is to create a Perl server that gets accessed through XML/RPC or SOAP.</p>
|
1148
|
-
<h2><a name="label-39" id="label-39">Installing required external
|
1151
|
+
<h2><a name="label-39" id="label-39">Installing required external libraries</a></h2><!-- RDLabel: "Installing required external libraries" -->
|
1149
1152
|
<p>At this point for using BioRuby no additional libraries are needed, except if
|
1150
|
-
you are using Bio::PhyloXML module
|
1153
|
+
you are using the Bio::PhyloXML module; then you have to install libxml-ruby.</p>
|
1151
1154
|
<p>This may change, so keep an eye on the Bioruby website. Also when
|
1152
1155
|
a package is missing BioRuby should show an informative message.</p>
|
1153
1156
|
<p>At this point installing third party Ruby packages can be a bit
|
@@ -1155,16 +1158,16 @@ painful, as the gem standard for packages evolved late and some still
|
|
1155
1158
|
force you to copy things by hand. Therefore read the README's
|
1156
1159
|
carefully that come with each package.</p>
|
1157
1160
|
<h3><a name="label-40" id="label-40">Installing libxml-ruby</a></h3><!-- RDLabel: "Installing libxml-ruby" -->
|
1158
|
-
<p>The simplest way is to use
|
1161
|
+
<p>The simplest way is to use the RubyGems packaging system:</p>
|
1159
1162
|
<pre>gem install -r libxml-ruby</pre>
|
1160
1163
|
<p>If you get `require': no such file to load - mkmf (LoadError) error then do</p>
|
1161
1164
|
<pre>sudo apt-get install ruby-dev</pre>
|
1162
|
-
<p>If you have other problems with installation, then see <a href="http://libxml.rubyforge.org/install.xml"><URL:http://libxml.rubyforge.org/install.xml></a
|
1165
|
+
<p>If you have other problems with installation, then see <a href="http://libxml.rubyforge.org/install.xml"><URL:http://libxml.rubyforge.org/install.xml></a>.</p>
|
1163
1166
|
<h2><a name="label-41" id="label-41">Trouble shooting</a></h2><!-- RDLabel: "Trouble shooting" -->
|
1164
1167
|
<ul>
|
1165
1168
|
<li>Error: in `require': no such file to load -- bio (LoadError)</li>
|
1166
1169
|
</ul>
|
1167
|
-
<p>Ruby
|
1170
|
+
<p>Ruby is failing to find the BioRuby libraries - add it to the RUBYLIB path, or pass
|
1168
1171
|
it to the interpeter. For example:</p>
|
1169
1172
|
<pre>ruby -I$BIORUBYPATH/lib yourprogram.rb</pre>
|
1170
1173
|
<h2><a name="label-42" id="label-42">Modifying this page</a></h2><!-- RDLabel: "Modifying this page" -->
|