bio 1.4.1 → 1.4.2
Sign up to get free protection for your applications and to get access to all the features.
- data/ChangeLog +954 -0
- data/KNOWN_ISSUES.rdoc +40 -5
- data/README.rdoc +36 -35
- data/RELEASE_NOTES.rdoc +87 -59
- data/bioruby.gemspec +24 -2
- data/doc/RELEASE_NOTES-1.4.1.rdoc +104 -0
- data/doc/Tutorial.rd +162 -200
- data/doc/Tutorial.rd.html +149 -146
- data/lib/bio.rb +1 -0
- data/lib/bio/appl/blast.rb +1 -1
- data/lib/bio/appl/blast/ddbj.rb +26 -34
- data/lib/bio/appl/blast/genomenet.rb +21 -11
- data/lib/bio/db/embl/sptr.rb +193 -21
- data/lib/bio/db/fasta.rb +1 -1
- data/lib/bio/db/fastq.rb +14 -0
- data/lib/bio/db/fastq/format_fastq.rb +2 -2
- data/lib/bio/db/genbank/ddbj.rb +1 -2
- data/lib/bio/db/genbank/format_genbank.rb +1 -1
- data/lib/bio/db/medline.rb +1 -0
- data/lib/bio/db/newick.rb +3 -1
- data/lib/bio/db/pdb/pdb.rb +9 -9
- data/lib/bio/db/pdb/residue.rb +2 -2
- data/lib/bio/io/ddbjrest.rb +344 -0
- data/lib/bio/io/ncbirest.rb +121 -1
- data/lib/bio/location.rb +2 -2
- data/lib/bio/reference.rb +3 -4
- data/lib/bio/shell/plugin/entry.rb +7 -3
- data/lib/bio/shell/plugin/ncbirest.rb +5 -1
- data/lib/bio/util/restriction_enzyme.rb +3 -0
- data/lib/bio/util/restriction_enzyme/dense_int_array.rb +195 -0
- data/lib/bio/util/restriction_enzyme/range/sequence_range.rb +7 -7
- data/lib/bio/util/restriction_enzyme/range/sequence_range/calculated_cuts.rb +57 -18
- data/lib/bio/util/restriction_enzyme/range/sequence_range/fragment.rb +2 -2
- data/lib/bio/util/restriction_enzyme/sorted_num_array.rb +219 -0
- data/lib/bio/version.rb +1 -1
- data/sample/test_restriction_enzyme_long.rb +4403 -0
- data/test/data/fasta/EFTU_BACSU.fasta +8 -0
- data/test/data/genbank/CAA35997.gp +48 -0
- data/test/data/genbank/SCU49845.gb +167 -0
- data/test/data/litdb/1717226.litdb +13 -0
- data/test/data/pir/CRAB_ANAPL.pir +6 -0
- data/test/functional/bio/appl/blast/test_remote.rb +93 -0
- data/test/functional/bio/appl/test_blast.rb +61 -0
- data/test/functional/bio/io/test_ddbjrest.rb +47 -0
- data/test/functional/bio/test_command.rb +3 -3
- data/test/unit/bio/db/embl/test_sptr.rb +6 -6
- data/test/unit/bio/db/embl/test_uniprot_new_part.rb +208 -0
- data/test/unit/bio/db/genbank/test_common.rb +274 -0
- data/test/unit/bio/db/genbank/test_genbank.rb +401 -0
- data/test/unit/bio/db/genbank/test_genpept.rb +81 -0
- data/test/unit/bio/db/pdb/test_pdb.rb +3287 -11
- data/test/unit/bio/db/test_fasta.rb +34 -12
- data/test/unit/bio/db/test_fastq.rb +26 -0
- data/test/unit/bio/db/test_litdb.rb +95 -0
- data/test/unit/bio/db/test_medline.rb +1 -0
- data/test/unit/bio/db/test_nbrf.rb +82 -0
- data/test/unit/bio/db/test_newick.rb +22 -4
- data/test/unit/bio/test_reference.rb +35 -0
- data/test/unit/bio/util/restriction_enzyme/test_dense_int_array.rb +201 -0
- data/test/unit/bio/util/restriction_enzyme/test_sorted_num_array.rb +281 -0
- metadata +44 -38
data/doc/Tutorial.rd.html
CHANGED
@@ -11,29 +11,29 @@
|
|
11
11
|
<h1><a name="label-0" id="label-0">BioRuby Tutorial</a></h1><!-- RDLabel: "BioRuby Tutorial" -->
|
12
12
|
<ul>
|
13
13
|
<li>Copyright (C) 2001-2003 KATAYAMA Toshiaki <k .at. bioruby.org></li>
|
14
|
-
<li>Copyright (C) 2005-
|
14
|
+
<li>Copyright (C) 2005-2011 Pjotr Prins, Naohisa Goto and others</li>
|
15
15
|
</ul>
|
16
|
-
<p>This document was last modified:
|
17
|
-
Current editor:
|
18
|
-
<p>The latest version resides in the GIT source code repository: ./doc/<a href="
|
16
|
+
<p>This document was last modified: 2011/03/24
|
17
|
+
Current editor: Michael O'Keefe <okeefm (at) rpi (dot) edu></p>
|
18
|
+
<p>The latest version resides in the GIT source code repository: ./doc/<a href="https://github.com/bioruby/bioruby/blob/master/doc/Tutorial.rd">Tutorial.rd</a>.</p>
|
19
19
|
<h2><a name="label-1" id="label-1">Introduction</a></h2><!-- RDLabel: "Introduction" -->
|
20
20
|
<p>This is a tutorial for using Bioruby. A basic knowledge of Ruby is required.
|
21
|
-
If you want to know more about the programming
|
21
|
+
If you want to know more about the programming language, we recommend the
|
22
22
|
latest Ruby book <a href="http://www.pragprog.com/titles/ruby">Programming Ruby</a>
|
23
|
-
by Dave Thomas and Andy Hunt - the first edition
|
23
|
+
by Dave Thomas and Andy Hunt - the first edition can be read online
|
24
24
|
<a href="http://www.ruby-doc.org/docs/ProgrammingRuby/">here</a>.</p>
|
25
25
|
<p>For BioRuby you need to install Ruby and the BioRuby package on your computer</p>
|
26
26
|
<p>You can check whether Ruby is installed on your computer and what
|
27
27
|
version it has with the</p>
|
28
28
|
<pre>% ruby -v</pre>
|
29
|
-
<p>command.
|
29
|
+
<p>command. You should see something like:</p>
|
30
30
|
<pre>ruby 1.8.7 (2008-08-11 patchlevel 72) [i486-linux]</pre>
|
31
31
|
<p>If you see no such thing you'll have to install Ruby using your installation
|
32
32
|
manager. For more information see the
|
33
33
|
<a href="http://www.ruby-lang.org/en/">Ruby</a> website.</p>
|
34
34
|
<p>With Ruby download and install Bioruby using the links on the
|
35
35
|
<a href="http://bioruby.org/">Bioruby</a> website. The recommended installation is via
|
36
|
-
|
36
|
+
RubyGems:</p>
|
37
37
|
<pre>gem install bio</pre>
|
38
38
|
<p>See also the Bioruby <a href="http://bioruby.open-bio.org/wiki/Installation">wiki</a>.</p>
|
39
39
|
<p>A lot of BioRuby's documentation exists in the source code and unit tests. To
|
@@ -41,9 +41,10 @@ really dive in you will need the latest source code tree. The embedded rdoc
|
|
41
41
|
documentation can be viewed online at
|
42
42
|
<a href="http://bioruby.org/rdoc/">bioruby's rdoc</a>. But first lets start!</p>
|
43
43
|
<h2><a name="label-2" id="label-2">Trying Bioruby</a></h2><!-- RDLabel: "Trying Bioruby" -->
|
44
|
-
<p>Bioruby comes with its own shell. After unpacking the sources run the
|
45
|
-
|
46
|
-
<
|
44
|
+
<p>Bioruby comes with its own shell. After unpacking the sources run one of the following commands:</p>
|
45
|
+
<pre>bioruby</pre>
|
46
|
+
<p>or, from the source tree</p>
|
47
|
+
<pre>cd bioruby
|
47
48
|
ruby -I lib bin/bioruby</pre>
|
48
49
|
<p>and you should see a prompt</p>
|
49
50
|
<pre>bioruby></pre>
|
@@ -60,11 +61,11 @@ question to the mailing list. BioRuby developers usually try to help.</p>
|
|
60
61
|
<h2><a name="label-3" id="label-3">Working with nucleic / amino acid sequences (Bio::Sequence class)</a></h2><!-- RDLabel: "Working with nucleic / amino acid sequences (Bio::Sequence class)" -->
|
61
62
|
<p>The Bio::Sequence class allows the usual sequence transformations and
|
62
63
|
translations. In the example below the DNA sequence "atgcatgcaaaa" is
|
63
|
-
converted into the complemental strand
|
64
|
-
next the nucleic acid composition is calculated and the sequence is
|
64
|
+
converted into the complemental strand and spliced into a subsequence;
|
65
|
+
next, the nucleic acid composition is calculated and the sequence is
|
65
66
|
translated into the amino acid sequence, the molecular weight
|
66
|
-
calculated, and so on. When translating into amino acid sequences the
|
67
|
-
frame can be specified and optionally the
|
67
|
+
calculated, and so on. When translating into amino acid sequences, the
|
68
|
+
frame can be specified and optionally the codon table selected (as
|
68
69
|
defined in codontable.rb).</p>
|
69
70
|
<pre>bioruby> seq = Bio::Sequence::NA.new("atgcatgcaaaa")
|
70
71
|
==> "atgcatgcaaaa"
|
@@ -73,7 +74,7 @@ defined in codontable.rb).</p>
|
|
73
74
|
bioruby> seq.complement
|
74
75
|
==> "ttttgcatgcat"
|
75
76
|
|
76
|
-
bioruby> seq.subseq(3,8) # gets subsequence of positions 3 to 8
|
77
|
+
bioruby> seq.subseq(3,8) # gets subsequence of positions 3 to 8 (starting from 1)
|
77
78
|
==> "gcatgc"
|
78
79
|
bioruby> seq.gc_percent
|
79
80
|
==> 33
|
@@ -112,10 +113,10 @@ Windows). For example</p>
|
|
112
113
|
<pre>% ri puts
|
113
114
|
% ri p
|
114
115
|
% ri File.open</pre>
|
115
|
-
<p>Nucleic acid sequence
|
116
|
-
amino acid sequence
|
116
|
+
<p>Nucleic acid sequence are members of the Bio::Sequence::NA class, and
|
117
|
+
amino acid sequence are members of the Bio::Sequence::AA class. Shared
|
117
118
|
methods are in the parent Bio::Sequence class.</p>
|
118
|
-
<p>As Bio::Sequence
|
119
|
+
<p>As Bio::Sequence inherits Ruby's String class, you can use
|
119
120
|
String class methods. For example, to get a subsequence, you can
|
120
121
|
not only use subseq(from, to) but also String#[].</p>
|
121
122
|
<p>Please take note that the Ruby's string's are base 0 - i.e. the first letter
|
@@ -128,14 +129,13 @@ bioruby> s[0..1]
|
|
128
129
|
==> "ab"</pre>
|
129
130
|
<p>So when using String methods, you should subtract 1 from positions
|
130
131
|
conventionally used in biology. (subseq method will throw an exception if you
|
131
|
-
specify positions smaller than or equal to 0 for either one of the "from" or
|
132
|
-
"to".)</p>
|
132
|
+
specify positions smaller than or equal to 0 for either one of the "from" or "to".)</p>
|
133
133
|
<p>The window_search(window_size, step_size) method shows a typical Ruby
|
134
134
|
way of writing concise and clear code using 'closures'. Each sliding
|
135
135
|
window creates a subsequence which is supplied to the enclosed block
|
136
136
|
through a variable named +s+.</p>
|
137
137
|
<ul>
|
138
|
-
<li><p>Show average percentage of GC content for 20 bases (stepping the default one base at a time)
|
138
|
+
<li><p>Show average percentage of GC content for 20 bases (stepping the default one base at a time):</p>
|
139
139
|
<pre>bioruby> seq = Bio::Sequence::NA.new("atgcatgcaattaagctaatcccaattagatcatcccgatcatcaaaaaaaaaa")
|
140
140
|
==> "atgcatgcaattaagctaatcccaattagatcatcccgatcatcaaaaaaaaaa"
|
141
141
|
|
@@ -195,8 +195,8 @@ my_naseq = Bio::Sequence::NA.new(input_seq)
|
|
195
195
|
my_aaseq = my_naseq.translate
|
196
196
|
|
197
197
|
puts my_aaseq</pre>
|
198
|
-
<p>Save the program as na2aa.rb. Prepare a nucleic acid sequence
|
199
|
-
described below and
|
198
|
+
<p>Save the program above as na2aa.rb. Prepare a nucleic acid sequence
|
199
|
+
described below and save it as my_naseq.txt:</p>
|
200
200
|
<pre>gtggcgatctttccgaaagcgatgactggagcgaagaaccaaagcagtgacatttgtctg
|
201
201
|
atgccgcacgtaggcctgataagacgcggacagcgtcgcatcaggcatcttgtgcaaatg
|
202
202
|
tcggatgcggcgtga</pre>
|
@@ -207,7 +207,7 @@ For example, translates my_naseq.txt:</p>
|
|
207
207
|
<pre>% cat my_naseq.txt|ruby na2aa.rb</pre>
|
208
208
|
<p>Outputs</p>
|
209
209
|
<pre>VAIFPKAMTGAKNQSSDICLMPHVGLIRRGQRRIRHLVQMSDAA*</pre>
|
210
|
-
<p>You can also write this, a bit
|
210
|
+
<p>You can also write this, a bit fancifully, as a one-liner script.</p>
|
211
211
|
<pre>% ruby -r bio -e 'p Bio::Sequence::NA.new($<.read).translate' my_naseq.txt</pre>
|
212
212
|
<p>In the next section we will retrieve data from databases instead of using raw
|
213
213
|
sequence files. One generic example of the above can be found in
|
@@ -215,7 +215,7 @@ sequence files. One generic example of the above can be found in
|
|
215
215
|
<h2><a name="label-4" id="label-4">Parsing GenBank data (Bio::GenBank class)</a></h2><!-- RDLabel: "Parsing GenBank data (Bio::GenBank class)" -->
|
216
216
|
<p>We assume that you already have some GenBank data files. (If you don't,
|
217
217
|
download some .seq files from ftp://ftp.ncbi.nih.gov/genbank/)</p>
|
218
|
-
<p>As an example we fetch the ID, definition and sequence of each entry
|
218
|
+
<p>As an example we will fetch the ID, definition and sequence of each entry
|
219
219
|
from the GenBank format and convert it to FASTA. This is also an example
|
220
220
|
script in the BioRuby distribution.</p>
|
221
221
|
<p>A first attempt could be to use the Bio::GenBank class for reading in
|
@@ -256,7 +256,7 @@ ff.each_entry do |f|
|
|
256
256
|
puts "nalen : " + f.nalen.to_s
|
257
257
|
puts "naseq : " + f.naseq
|
258
258
|
end</pre>
|
259
|
-
<p>In above two scripts, the first arguments of Bio::FlatFile.new are
|
259
|
+
<p>In the above two scripts, the first arguments of Bio::FlatFile.new are
|
260
260
|
database classes of BioRuby. This is expanded on in a later section.</p>
|
261
261
|
<p>Again another option is to use the Bio::DB.open class:</p>
|
262
262
|
<pre>#!/usr/bin/env ruby
|
@@ -311,12 +311,9 @@ end</pre>
|
|
311
311
|
<ul>
|
312
312
|
<li>Note: In this example Feature#assoc method makes a Hash from a
|
313
313
|
feature object. It is useful because you can get data from the hash
|
314
|
-
by using qualifiers as keys.
|
315
|
-
(But there is a risk some information is lost when two or more
|
316
|
-
qualifiers are the same. Therefore an Array is returned by
|
317
|
-
Feature#feature)</li>
|
314
|
+
by using qualifiers as keys. But there is a risk some information is lost when two or more qualifiers are the same. Therefore an Array is returned by Feature#feature.</li>
|
318
315
|
</ul>
|
319
|
-
<p>Bio::Sequence#splicing splices
|
316
|
+
<p>Bio::Sequence#splicing splices subsequences from nucleic acid sequences
|
320
317
|
according to location information used in GenBank, EMBL and DDBJ.</p>
|
321
318
|
<p>When the specified translation table is different from the default
|
322
319
|
(universal), or when the first codon is not "atg" or the protein
|
@@ -332,7 +329,7 @@ bio/location.rb.</p>
|
|
332
329
|
<pre>locs = Bio::Locations.new('join((8298.8300)..10206,1..855)')
|
333
330
|
naseq.splicing(locs)</pre></li>
|
334
331
|
</ul>
|
335
|
-
<p>You can also use
|
332
|
+
<p>You can also use this splicing method for amino acid sequences
|
336
333
|
(Bio::Sequence::AA objects).</p>
|
337
334
|
<ul>
|
338
335
|
<li><p>Splicing peptide from a protein (e.g. signal peptide)</p>
|
@@ -344,10 +341,7 @@ with classes like Bio::GenBank, Bio::KEGG::GENES. A full list can be found in
|
|
344
341
|
the ./lib/bio/db directory of the BioRuby source tree.</p>
|
345
342
|
<p>In many cases the Bio::DatabaseClass acts as a factory pattern
|
346
343
|
and recognises the database type automatically - returning a
|
347
|
-
parsed object. For example using Bio::FlatFile
|
348
|
-
<p>Bio::FlatFile class as described above. The first argument of the
|
349
|
-
Bio::FlatFile.new is database class name in BioRuby (such as Bio::GenBank,
|
350
|
-
Bio::KEGG::GENES and so on).</p>
|
344
|
+
parsed object. For example using Bio::FlatFile class as described above. The first argument of the Bio::FlatFile.new is database class name in BioRuby (such as Bio::GenBank, Bio::KEGG::GENES and so on).</p>
|
351
345
|
<pre>ff = Bio::FlatFile.new(Bio::DatabaseClass, ARGF)</pre>
|
352
346
|
<p>Isn't it wonderful that Bio::FlatFile automagically recognizes each
|
353
347
|
database class?</p>
|
@@ -361,16 +355,15 @@ ff.each_entry do |entry|
|
|
361
355
|
p entry.definition # definition of the entry
|
362
356
|
p entry.seq # sequence data of the entry
|
363
357
|
end</pre>
|
364
|
-
<p>An example that can take any input, filter using a regular expression
|
358
|
+
<p>An example that can take any input, filter using a regular expression and output
|
365
359
|
to a FASTA file can be found in sample/any2fasta.rb. With this technique it is
|
366
360
|
possible to write a Unix type grep/sort pipe for sequence information. One
|
367
361
|
example using scripts in the BIORUBY sample folder:</p>
|
368
362
|
<pre>fastagrep.rb '/At|Dm/' database.seq | fastasort.rb</pre>
|
369
|
-
<p>greps the database for Arabidopsis and Drosophila entries and sorts the output
|
370
|
-
to FASTA.</p>
|
363
|
+
<p>greps the database for Arabidopsis and Drosophila entries and sorts the output to FASTA.</p>
|
371
364
|
<p>Other methods to extract specific data from database objects can be
|
372
365
|
different between databases, though some methods are common (see the
|
373
|
-
guidelines for common methods
|
366
|
+
guidelines for common methods in bio/db.rb).</p>
|
374
367
|
<ul>
|
375
368
|
<li>entry_id --> gets ID of the entry</li>
|
376
369
|
<li>definition --> gets definition of the entry</li>
|
@@ -380,14 +373,13 @@ guidelines for common methods as described in bio/db.rb).</p>
|
|
380
373
|
</ul>
|
381
374
|
<p>Refer to the documents of each database to find the exact naming
|
382
375
|
of the included methods.</p>
|
383
|
-
<p>In
|
384
|
-
name is plural the method returns some object as an Array. For
|
376
|
+
<p>In general, BioRuby uses the following conventions: when a method
|
377
|
+
name is plural, the method returns some object as an Array. For
|
385
378
|
example, some classes have a "references" method which returns
|
386
379
|
multiple Bio::Reference objects as an Array. And some classes have a
|
387
380
|
"reference" method which returns a single Bio::Reference object.</p>
|
388
381
|
<h3><a name="label-6" id="label-6">Alignments (Bio::Alignment)</a></h3><!-- RDLabel: "Alignments (Bio::Alignment)" -->
|
389
|
-
<p>Bio::Alignment class in bio/alignment.rb is a container class like Ruby's Hash
|
390
|
-
Array and BioPerl's Bio::SimpleAlign. A very simple example is:</p>
|
382
|
+
<p>The Bio::Alignment class in bio/alignment.rb is a container class like Ruby's Hash and Array classes and BioPerl's Bio::SimpleAlign. A very simple example is:</p>
|
391
383
|
<pre>bioruby> seqs = [ 'atgca', 'aagca', 'acgca', 'acgcg' ]
|
392
384
|
bioruby> seqs = seqs.collect{ |x| Bio::Sequence::NA.new(x) }
|
393
385
|
# creates alignment object
|
@@ -417,15 +409,32 @@ a.each_site { |x| p x }
|
|
417
409
|
# clustalw command must be installed.
|
418
410
|
factory = Bio::ClustalW.new
|
419
411
|
a2 = a.do_align(factory)</pre>
|
412
|
+
<p>Read a ClustalW or Muscle 'ALN' alignment file:</p>
|
413
|
+
<pre>bioruby> aln = Bio::ClustalW::Report.new(File.read('../test/data/clustalw/example1.aln'))
|
414
|
+
bioruby> aln.header
|
415
|
+
==> "CLUSTAL 2.0.9 multiple sequence alignment"</pre>
|
416
|
+
<p>Fetch a sequence:</p>
|
417
|
+
<pre>bioruby> seq = aln.get_sequence(1)
|
418
|
+
bioruby> seq.definition
|
419
|
+
==> "gi|115023|sp|P10425|"</pre>
|
420
|
+
<p>Get a partial sequence:</p>
|
421
|
+
<pre>bioruby> seq.to_s[60..120]
|
422
|
+
==> "LGYFNG-EAVPSNGLVLNTSKGLVLVDSSWDNKLTKELIEMVEKKFQKRVTDVIITHAHAD"</pre>
|
423
|
+
<p>Show the full alignment residue match information for the sequences in the set:</p>
|
424
|
+
<pre>bioruby> aln.match_line[60..120]
|
425
|
+
==> " . **. . .. ::*: . * : : . .: .* * *"</pre>
|
426
|
+
<p>Return a Bio::Alignment object:</p>
|
427
|
+
<pre>bioruby> aln.alignment.consensus[60..120]
|
428
|
+
==> "???????????SN?????????????D??????????L??????????????????H?H?D"</pre>
|
420
429
|
<h2><a name="label-7" id="label-7">Restriction Enzymes (Bio::RE)</a></h2><!-- RDLabel: "Restriction Enzymes (Bio::RE)" -->
|
421
430
|
<p>BioRuby has extensive support for restriction enzymes (REs). It contains a full
|
422
431
|
library of commonly used REs (from REBASE) which can be used to cut single
|
423
|
-
stranded RNA or
|
432
|
+
stranded RNA or double stranded DNA into fragments. To list all enzymes:</p>
|
424
433
|
<pre>rebase = Bio::RestrictionEnzyme.rebase
|
425
434
|
rebase.each do |enzyme_name, info|
|
426
435
|
p enzyme_name
|
427
436
|
end</pre>
|
428
|
-
<p>and cut a sequence with an enzyme follow up with:</p>
|
437
|
+
<p>and to cut a sequence with an enzyme follow up with:</p>
|
429
438
|
<pre>res = seq.cut_with_enzyme('EcoRII', {:max_permutations => 0},
|
430
439
|
{:view_ranges => true})
|
431
440
|
if res.kind_of? Symbol #error
|
@@ -451,12 +460,12 @@ res.each do |frag|
|
|
451
460
|
<p>Let's start with a query.pep file which contains a sequence in FASTA
|
452
461
|
format. In this example we are going to execute a homology search
|
453
462
|
from a remote internet site or on your local machine. Note that you
|
454
|
-
can use the ssearch program instead of fasta when you use
|
463
|
+
can use the ssearch program instead of fasta when you use it in your
|
455
464
|
local machine.</p>
|
456
465
|
<h3><a name="label-9" id="label-9">using FASTA in local machine</a></h3><!-- RDLabel: "using FASTA in local machine" -->
|
457
466
|
<p>Install the fasta program on your machine (the command name looks like
|
458
|
-
fasta34. FASTA can be downloaded from ftp://ftp.virginia.edu/pub/fasta/)
|
459
|
-
First, you must prepare your FASTA-formatted database sequence file
|
467
|
+
fasta34. FASTA can be downloaded from ftp://ftp.virginia.edu/pub/fasta/).</p>
|
468
|
+
<p>First, you must prepare your FASTA-formatted database sequence file
|
460
469
|
target.pep and FASTA-formatted query.pep. </p>
|
461
470
|
<pre>#!/usr/bin/env ruby
|
462
471
|
|
@@ -489,19 +498,18 @@ ff.each do |entry|
|
|
489
498
|
end
|
490
499
|
end
|
491
500
|
end</pre>
|
492
|
-
<p>We named above script
|
501
|
+
<p>We named above script f_search.rb. You can execute it as follows:</p>
|
493
502
|
<pre>% ./f_search.rb query.pep target.pep > f_search.out</pre>
|
494
503
|
<p>In above script, the variable "factory" is a factory object for executing
|
495
504
|
FASTA many times easily. Instead of using Fasta#query method,
|
496
505
|
Bio::Sequence#fasta method can be used.</p>
|
497
506
|
<pre>seq = ">test seq\nYQVLEEIGRGSFGSVRKVIHIPTKKLLVRKDIKYGHMNSKE"
|
498
507
|
seq.fasta(factory)</pre>
|
499
|
-
<p>When you want to add options to FASTA
|
500
|
-
third argument of Bio::Fasta.local method. For example,
|
501
|
-
and getting top-10 hits:</p>
|
508
|
+
<p>When you want to add options to FASTA commands, you can set the
|
509
|
+
third argument of the Bio::Fasta.local method. For example, the following sets ktup to 1 and gets a list of the top 10 hits:</p>
|
502
510
|
<pre>factory = Bio::Fasta.local('fasta34', 'target.pep', '-b 10')
|
503
511
|
factory.ktup = 1</pre>
|
504
|
-
<p>Bio::Fasta#query returns Bio::Fasta::Report object.
|
512
|
+
<p>Bio::Fasta#query returns a Bio::Fasta::Report object.
|
505
513
|
We can get almost all information described in FASTA report text
|
506
514
|
with the Report object. For example, getting information for hits:</p>
|
507
515
|
<pre>report.each do |hit|
|
@@ -527,11 +535,10 @@ with the Report object. For example, getting information for hits:</p>
|
|
527
535
|
# in hit(target) sequence
|
528
536
|
puts hit.lap_at # array of above four numbers
|
529
537
|
end</pre>
|
530
|
-
<p>Most of above methods are common
|
531
|
-
below. Please refer to
|
538
|
+
<p>Most of above methods are common to the Bio::Blast::Report described
|
539
|
+
below. Please refer to the documentation of the Bio::Fasta::Report class for
|
532
540
|
FASTA-specific details.</p>
|
533
|
-
<p>If you need original output text of FASTA program you can use the "output"
|
534
|
-
method of the factory object after the "query" method.</p>
|
541
|
+
<p>If you need the original output text of FASTA program you can use the "output" method of the factory object after the "query" method.</p>
|
535
542
|
<pre>report = factory.query(entry)
|
536
543
|
puts factory.output</pre>
|
537
544
|
<h3><a name="label-10" id="label-10">using FASTA from a remote internet site</a></h3><!-- RDLabel: "using FASTA from a remote internet site" -->
|
@@ -558,7 +565,7 @@ same things as with a local method.</p>
|
|
558
565
|
<p>Select the databases you require. Next, give the search program from
|
559
566
|
the type of query sequence and database.</p>
|
560
567
|
<ul>
|
561
|
-
<li>When query is
|
568
|
+
<li>When query is an amino acid sequence
|
562
569
|
<ul>
|
563
570
|
<li>When protein database, program is "fasta".</li>
|
564
571
|
<li>When nucleic database, program is "tfasta".</li>
|
@@ -566,10 +573,10 @@ the type of query sequence and database.</p>
|
|
566
573
|
<li>When query is a nucleic acid sequence
|
567
574
|
<ul>
|
568
575
|
<li>When nucleic database, program is "fasta".</li>
|
569
|
-
<li>(When protein database,
|
576
|
+
<li>(When protein database, the search would fail.)</li>
|
570
577
|
</ul></li>
|
571
578
|
</ul>
|
572
|
-
<p>For example:</p>
|
579
|
+
<p>For example, run:</p>
|
573
580
|
<pre>program = 'fasta'
|
574
581
|
database = 'genes'
|
575
582
|
|
@@ -600,7 +607,7 @@ The parameter "program" is different from FASTA - as you can expect:</p>
|
|
600
607
|
<p>Bio::BLAST uses "-m 7" XML output of BLAST by default when either
|
601
608
|
XMLParser or REXML (both of them are XML parser libraries for Ruby -
|
602
609
|
of the two XMLParser is the fastest) is installed on your computer. In
|
603
|
-
Ruby version 1.8.0
|
610
|
+
Ruby version 1.8.0 or later, REXML is bundled with Ruby's
|
604
611
|
distribution.</p>
|
605
612
|
<p>When no XML parser library is present, Bio::BLAST uses "-m 8" tabular
|
606
613
|
deliminated format. Available information is limited with the
|
@@ -631,9 +638,9 @@ midline.</p>
|
|
631
638
|
puts hit.lap_at
|
632
639
|
end</pre>
|
633
640
|
<p>For simplicity and API compatibility, some information such as score
|
634
|
-
|
641
|
+
is extracted from the first Hsp (High-scoring Segment Pair).</p>
|
635
642
|
<p>Check the documentation for Bio::Blast::Report to see what can be
|
636
|
-
retrieved. For now suffice to
|
643
|
+
retrieved. For now suffice to say that Bio::Blast::Report has a
|
637
644
|
hierarchical structure mirroring the general BLAST output stream:</p>
|
638
645
|
<ul>
|
639
646
|
<li>In a Bio::Blast::Report object, @iterations is an array of
|
@@ -699,11 +706,10 @@ want to add other sites, you must write the following:</p>
|
|
699
706
|
named "exec_MYSITE" to get query sequence and to pass the result to
|
700
707
|
Bio::Blast::Report.new(or Bio::Blast::Default::Report.new):</p>
|
701
708
|
<pre>factory = Bio::Blast.remote(program, db, option, 'MYSITE')</pre>
|
702
|
-
<p>When you write above routines, please send to the BioRuby project and
|
703
|
-
they may be included.</p>
|
709
|
+
<p>When you write above routines, please send them to the BioRuby project, and they may be included in future releases.</p>
|
704
710
|
<h2><a name="label-14" id="label-14">Generate a reference list using PubMed (Bio::PubMed)</a></h2><!-- RDLabel: "Generate a reference list using PubMed (Bio::PubMed)" -->
|
705
711
|
<p>Nowadays using NCBI E-Utils is recommended. Use Bio::PubMed.esearch
|
706
|
-
and Bio::PubMed.efetch
|
712
|
+
and Bio::PubMed.efetch.</p>
|
707
713
|
<pre>#!/usr/bin/env ruby
|
708
714
|
|
709
715
|
require 'bio'
|
@@ -741,7 +747,7 @@ BibTeX format bibliography data to a file named genoinfo.bib.</p>
|
|
741
747
|
% ./pmsearch.rb genome bioinformatics >> genoinfo.bib</pre>
|
742
748
|
<p>The BibTeX can be used with Tex or LaTeX to form bibliography
|
743
749
|
information with your journal article. For more information
|
744
|
-
on BibTex see
|
750
|
+
on using BibTex see <a href="http://www.bibtex.org/Using/">BibTex HowTo site</a>. A quick example:</p>
|
745
751
|
<p>Save this to hoge.tex:</p>
|
746
752
|
<pre>\documentclass{jarticle}
|
747
753
|
\begin{document}
|
@@ -754,12 +760,11 @@ foo bar KEGG database~\cite{PMID:10592173} baz hoge fuga.
|
|
754
760
|
% bibtex hoge # processes genoinfo.bib
|
755
761
|
% latex hoge # creates bibliography list
|
756
762
|
% latex hoge # inserts correct bibliography reference</pre>
|
757
|
-
<p>Now, you get hoge.dvi and hoge.ps - the latter
|
758
|
-
Postscript viewer.</p>
|
763
|
+
<p>Now, you get hoge.dvi and hoge.ps - the latter of which can be viewed with any Postscript viewer.</p>
|
759
764
|
<h3><a name="label-16" id="label-16">Bio::Reference#bibitem</a></h3><!-- RDLabel: "Bio::Reference#bibitem" -->
|
760
765
|
<p>When you don't want to create a bib file, you can use
|
761
766
|
Bio::Reference#bibitem method instead of Bio::Reference#bibtex.
|
762
|
-
In above pmfetch.rb and pmsearch.rb scripts, change</p>
|
767
|
+
In the above pmfetch.rb and pmsearch.rb scripts, change</p>
|
763
768
|
<pre>puts reference.bibtex</pre>
|
764
769
|
<p>to</p>
|
765
770
|
<pre>puts reference.bibitem</pre>
|
@@ -801,12 +806,12 @@ BioRuby and other projects' members (2002).</p>
|
|
801
806
|
</ul></li>
|
802
807
|
<li>BioSQL
|
803
808
|
<ul>
|
804
|
-
<li>Schemas to store sequence data to relational
|
809
|
+
<li>Schemas to store sequence data to relational databases such as
|
805
810
|
MySQL and PostgreSQL, and methods to retrieve entries from the database.</li>
|
806
811
|
</ul></li>
|
807
812
|
</ul>
|
808
|
-
<p>
|
809
|
-
<a href="http://obda.open-bio.org
|
813
|
+
<p>This tutorial only gives a quick overview of OBDA. Check out
|
814
|
+
<a href="http://obda.open-bio.org">the OBDA site</a> for more extensive details.</p>
|
810
815
|
<h2><a name="label-18" id="label-18">BioRegistry</a></h2><!-- RDLabel: "BioRegistry" -->
|
811
816
|
<p>BioRegistry allows for locating retrieval methods and database
|
812
817
|
locations through configuration files. The priorities are</p>
|
@@ -821,14 +826,14 @@ when all local configulation files are not available.</p>
|
|
821
826
|
<p>In the current BioRuby implementation all local configulation files
|
822
827
|
are read. For databases with the same name settings encountered first
|
823
828
|
are used. This means that if you don't like some settings of a
|
824
|
-
database in system global configuration file
|
825
|
-
(/etc/bioinformatics/seqdatabase.ini), you can easily override
|
829
|
+
database in the system's global configuration file
|
830
|
+
(/etc/bioinformatics/seqdatabase.ini), you can easily override them by
|
826
831
|
writing settings to ~/.bioinformatics/seqdatabase.ini.</p>
|
827
832
|
<p>The syntax of the configuration file is called a stanza format. For example</p>
|
828
833
|
<pre>[DatabaseName]
|
829
834
|
protocol=ProtocolName
|
830
|
-
location=
|
831
|
-
<p>You can write a description like above entry for every database.</p>
|
835
|
+
location=ServerName</pre>
|
836
|
+
<p>You can write a description like the above entry for every database.</p>
|
832
837
|
<p>The database name is a local label for yourself, so you can name it
|
833
838
|
freely and it can differ from the name of the actual databases. In the
|
834
839
|
actual specification of BioRegistry where there are two or more
|
@@ -836,8 +841,8 @@ settings for a database of the same name, it is proposed that
|
|
836
841
|
connection to the database is tried sequentially with the order
|
837
842
|
written in configuration files. However, this has not (yet) been
|
838
843
|
implemented in BioRuby.</p>
|
839
|
-
<p>In addition, for some
|
840
|
-
other than locations (e.g. user name
|
844
|
+
<p>In addition, for some protocols, you must set additional options
|
845
|
+
other than locations (e.g. user name for MySQL). In the BioRegistory
|
841
846
|
specification, current available protocols are:</p>
|
842
847
|
<ul>
|
843
848
|
<li>index-flat</li>
|
@@ -850,8 +855,7 @@ specification, current available protocols are:</p>
|
|
850
855
|
<p>In BioRuby, you can use index-flat, index-berkleydb, biofetch and biosql.
|
851
856
|
Note that the BioRegistry specification sometimes gets updated and BioRuby
|
852
857
|
does not always follow quickly.</p>
|
853
|
-
<p>Here an example.
|
854
|
-
files:</p>
|
858
|
+
<p>Here is an example. It creates a Bio::Registry object and reads the configuration files:</p>
|
855
859
|
<pre>reg = Bio::Registry.new
|
856
860
|
|
857
861
|
# connects to the database "genbank"
|
@@ -859,42 +863,39 @@ serv = reg.get_database('genbank')
|
|
859
863
|
|
860
864
|
# gets entry of the ID
|
861
865
|
entry = serv.get_by_id('AA2CG')</pre>
|
862
|
-
<p>The variable "serv" is a server object corresponding to the
|
863
|
-
written in configuration files. The class of the object is one of
|
866
|
+
<p>The variable "serv" is a server object corresponding to the settings
|
867
|
+
written in the configuration files. The class of the object is one of
|
864
868
|
Bio::SQL, Bio::Fetch, and so on. Note that Bio::Registry#get_database("name")
|
865
869
|
returns nil if no database is found.</p>
|
866
|
-
<p>After that, you can use get_by_id method and some specific methods.
|
867
|
-
Please refer to below
|
870
|
+
<p>After that, you can use the get_by_id method and some specific methods.
|
871
|
+
Please refer to the sections below for more information.</p>
|
868
872
|
<h2><a name="label-19" id="label-19">BioFlat</a></h2><!-- RDLabel: "BioFlat" -->
|
869
873
|
<p>BioFlat is a mechanism to create index files of flat files and to retrieve
|
870
874
|
these entries fast. There are two index types. index-flat is a simple index
|
871
|
-
performing binary search without using
|
875
|
+
performing binary search without using any external libraries of Ruby. index-berkeleydb
|
872
876
|
uses Berkeley DB for indexing - but requires installing bdb on your computer,
|
873
|
-
as well as the BDB Ruby package.
|
874
|
-
br_bioflat.rb command bundled with BioRuby.</p>
|
877
|
+
as well as the BDB Ruby package. To create the index itself, you can use br_bioflat.rb command bundled with BioRuby.</p>
|
875
878
|
<pre>% br_bioflat.rb --makeindex database_name [--format data_format] filename...</pre>
|
876
879
|
<p>The format can be omitted because BioRuby has autodetection. If that
|
877
|
-
|
878
|
-
database class.</p>
|
880
|
+
doesn't work, you can try specifying the data format as the name of a BioRuby database class.</p>
|
879
881
|
<p>Search and retrieve data from database:</p>
|
880
882
|
<pre>% br_bioflat.rb database_name identifier</pre>
|
881
|
-
<p>For example, to create index of GenBank files gbbct*.seq and get entry
|
882
|
-
from the database:</p>
|
883
|
+
<p>For example, to create an index of GenBank files gbbct*.seq and get the entry from the database:</p>
|
883
884
|
<pre>% br_bioflat.rb --makeindex my_bctdb --format GenBank gbbct*.seq
|
884
885
|
% br_bioflat.rb my_bctdb A16STM262</pre>
|
885
886
|
<p>If you have Berkeley DB on your system and installed the bdb extension
|
886
|
-
module of Ruby (see http://raa.ruby-lang.org/project/bdb/), you can
|
887
|
+
module of Ruby (see <a href="http://raa.ruby-lang.org/project/bdb/">the BDB project page</a> ), you can
|
887
888
|
create and search indexes with Berkeley DB - a very fast alternative
|
888
889
|
that uses little computer memory. When creating the index, use the
|
889
890
|
"--makeindex-bdb" option instead of "--makeindex".</p>
|
890
891
|
<pre>% br_bioflat.rb --makeindex-bdb database_name [--format data_format] filename...</pre>
|
891
892
|
<h2><a name="label-20" id="label-20">BioFetch</a></h2><!-- RDLabel: "BioFetch" -->
|
892
893
|
<pre>Note: this section is an advanced topic</pre>
|
893
|
-
<p>BioFetch is a database retrieval mechanism via CGI.
|
894
|
-
options and error codes are standardized.
|
894
|
+
<p>BioFetch is a database retrieval mechanism via CGI. CGI Parameters,
|
895
|
+
options and error codes are standardized. Client access via
|
895
896
|
http is possible giving the database name, identifiers and format to
|
896
897
|
retrieve entries.</p>
|
897
|
-
<p>The BioRuby project has a BioFetch server
|
898
|
+
<p>The BioRuby project has a BioFetch server at bioruby.org. It uses
|
898
899
|
GenomeNet's DBGET system as a backend. The source code of the
|
899
900
|
server is in sample/ directory. Currently, there are only two
|
900
901
|
BioFetch servers in the world: bioruby.org and EBI.</p>
|
@@ -912,18 +913,18 @@ entry = serv.fetch(db_name, entry_id)</pre></li>
|
|
912
913
|
serv = reg.get_database('genbank')
|
913
914
|
entry = serv.get_by_id('AA2CG')</pre></li>
|
914
915
|
</ol>
|
915
|
-
<p>If you want to use (4), you
|
916
|
-
in seqdatabase.ini.
|
916
|
+
<p>If you want to use (4), you have to include some settings
|
917
|
+
in seqdatabase.ini. For example:</p>
|
917
918
|
<pre>[genbank]
|
918
919
|
protocol=biofetch
|
919
920
|
location=http://bioruby.org/cgi-bin/biofetch.rb
|
920
921
|
biodbname=genbank</pre>
|
921
922
|
<h3><a name="label-21" id="label-21">The combination of BioFetch, Bio::KEGG::GENES and Bio::AAindex1</a></h3><!-- RDLabel: "The combination of BioFetch, Bio::KEGG::GENES and Bio::AAindex1" -->
|
922
|
-
<p>Bioinformatics is often about
|
923
|
-
example
|
924
|
-
Halobacterium from KEGG GENES database and
|
923
|
+
<p>Bioinformatics is often about gluing things together. Here is an
|
924
|
+
example that gets the bacteriorhodopsin gene (VNG1467G) of the archaea
|
925
|
+
Halobacterium from KEGG GENES database and gets alpha-helix index
|
925
926
|
data (BURA740101) from the AAindex (Amino acid indices and similarity
|
926
|
-
matrices) database, and
|
927
|
+
matrices) database, and shows the helix score for each 15-aa length
|
927
928
|
overlapping window.</p>
|
928
929
|
<pre>#!/usr/bin/env ruby
|
929
930
|
|
@@ -943,14 +944,14 @@ aaseq.window_search(win_size) do |subseq|
|
|
943
944
|
puts [ position, score ].join("\t")
|
944
945
|
position += 1
|
945
946
|
end</pre>
|
946
|
-
<p>The special method Bio::Fetch.query uses preset BioFetch server
|
947
|
-
|
947
|
+
<p>The special method Bio::Fetch.query uses the preset BioFetch server
|
948
|
+
at bioruby.org. (The server internally gets data from GenomeNet.
|
948
949
|
Because the KEGG/GENES database and AAindex database are not available
|
949
|
-
from other BioFetch servers, we used bioruby.org server with
|
950
|
+
from other BioFetch servers, we used the bioruby.org server with
|
950
951
|
Bio::Fetch.query method.)</p>
|
951
952
|
<h2><a name="label-22" id="label-22">BioSQL</a></h2><!-- RDLabel: "BioSQL" -->
|
952
|
-
<p>BioSQL is a well known schema to store and retrive biological sequences using a RDBMS like PostgreSQL or MySQL
|
953
|
-
First of all, you must install a database engine or have access to a remote one. Then create the schema and populate with the taxonomy. You can follow the <a href="http://code.open-bio.org/svnweb/index.cgi/biosql/view/biosql-schema/trunk/INSTALL">Official Guide</a> .
|
953
|
+
<p>BioSQL is a well known schema to store and retrive biological sequences using a RDBMS like PostgreSQL or MySQL: note that SQLite is not supported.
|
954
|
+
First of all, you must install a database engine or have access to a remote one. Then create the schema and populate with the taxonomy. You can follow the <a href="http://code.open-bio.org/svnweb/index.cgi/biosql/view/biosql-schema/trunk/INSTALL">Official Guide</a> to accomplish these steps.
|
954
955
|
Next step is to install these gems:</p>
|
955
956
|
<ul>
|
956
957
|
<li>ActiveRecord</li>
|
@@ -958,21 +959,22 @@ Next step is to install these gems:</p>
|
|
958
959
|
<li>The layer to comunicate with you preferred RDBMS (postgresql, mysql, jdbcmysql in case you are running JRuby )</li>
|
959
960
|
</ul>
|
960
961
|
<p>You can find ActiveRecord's models in /bioruby/lib/bio/io/biosql</p>
|
961
|
-
<p>When you have your database up and running, you can connect to it
|
962
|
+
<p>When you have your database up and running, you can connect to it like this:</p>
|
962
963
|
<pre>#!/usr/bin/env ruby
|
963
964
|
|
964
965
|
require 'bio'
|
965
966
|
|
966
967
|
connection = Bio::SQL.establish_connection({'development'=>{'hostname'=>"YourHostname",
|
967
|
-
|
968
|
-
|
969
|
-
|
970
|
-
|
971
|
-
|
972
|
-
|
973
|
-
|
968
|
+
'database'=>"CoolBioSeqDB",
|
969
|
+
'adapter'=>"jdbcmysql",
|
970
|
+
'username'=>"YourUser",
|
971
|
+
'password'=>"YouPassword"
|
972
|
+
}
|
973
|
+
},
|
974
|
+
'development')
|
974
975
|
|
975
|
-
#The first parameter is the hash contaning the description of the configuration similar to database.yml in Rails
|
976
|
+
#The first parameter is the hash contaning the description of the configuration; similar to database.yml in Rails applications, you can declare different environment.
|
977
|
+
#The second parameter is the environment to use: 'development', 'test', or 'production'.
|
976
978
|
|
977
979
|
#To store a sequence into the database you simply need a biosequence object.
|
978
980
|
biosql_database = Bio::SQL::Biodatabase.find(:first)
|
@@ -991,27 +993,28 @@ Bio::SQL.list_databases
|
|
991
993
|
#retriving a generic accession
|
992
994
|
bioseq = Bio::SQL.fetch_accession("YouAccession")
|
993
995
|
|
994
|
-
#If you use biosequence objects, you will find all its method mapped to BioSQL sequences.
|
996
|
+
#If you use biosequence objects, you will find all its method mapped to BioSQL sequences.
|
997
|
+
#But you can also access to the models directly:
|
995
998
|
|
996
|
-
#get the raw sequence associated with
|
999
|
+
#get the raw sequence associated with your accession
|
997
1000
|
bioseq.entry.biosequence
|
998
1001
|
|
999
|
-
#get the length of your sequence
|
1002
|
+
#get the length of your sequence; this is the explicit form of bioseq.length
|
1000
1003
|
bioseq.entry.biosequence.length
|
1001
1004
|
|
1002
|
-
#convert the sequence
|
1005
|
+
#convert the sequence into GenBank format
|
1003
1006
|
bioseq.to_biosequence.output(:genbank)</pre>
|
1004
|
-
<p>BioSQL' <a href="http://www.biosql.org/wiki/Schema_Overview">schema</a> is not
|
1007
|
+
<p>BioSQL's <a href="http://www.biosql.org/wiki/Schema_Overview">schema</a> is not very intuitive for beginners, so spend some time on understanding it. In the end if you know a little bit of Ruby on Rails, everything will go smoothly. You can find information on Annotation <a href="http://www.biosql.org/wiki/Annotation_Mapping">here</a>.
|
1005
1008
|
ToDo: add exemaples from George. I remember he did some cool post on BioSQL and Rails.</p>
|
1006
1009
|
<h1><a name="label-23" id="label-23">PhyloXML</a></h1><!-- RDLabel: "PhyloXML" -->
|
1007
1010
|
<p>PhyloXML is an XML language for saving, analyzing and exchanging data of
|
1008
|
-
annotated phylogenetic trees. PhyloXML parser in BioRuby is implemented in
|
1009
|
-
Bio::PhyloXML::Parser and writer in Bio::PhyloXML::Writer.
|
1010
|
-
More information at www.phyloxml.org</p>
|
1011
|
+
annotated phylogenetic trees. PhyloXML's parser in BioRuby is implemented in
|
1012
|
+
Bio::PhyloXML::Parser, and its writer in Bio::PhyloXML::Writer.
|
1013
|
+
More information can be found at <a href="http://www.phyloxml.org">www.phyloxml.org</a>.</p>
|
1011
1014
|
<h2><a name="label-24" id="label-24">Requirements</a></h2><!-- RDLabel: "Requirements" -->
|
1012
|
-
<p>In addition to BioRuby
|
1015
|
+
<p>In addition to BioRuby, you need the libxml Ruby bindings. To install, execute:</p>
|
1013
1016
|
<pre>% gem install -r libxml-ruby</pre>
|
1014
|
-
<p>For more information see <a href="http://libxml.rubyforge.org/install.xml"
|
1017
|
+
<p>For more information see the <a href="http://libxml.rubyforge.org/install.xml">libxml installer page</a></p>
|
1015
1018
|
<h2><a name="label-25" id="label-25">Parsing a file</a></h2><!-- RDLabel: "Parsing a file" -->
|
1016
1019
|
<pre>require 'bio'
|
1017
1020
|
|
@@ -1022,9 +1025,9 @@ phyloxml = Bio::PhyloXML::Parser.open('example.xml')
|
|
1022
1025
|
phyloxml.each do |tree|
|
1023
1026
|
puts tree.name
|
1024
1027
|
end</pre>
|
1025
|
-
<p>If there are several trees in the file, you can access the one you wish by
|
1028
|
+
<p>If there are several trees in the file, you can access the one you wish by specifying its index:</p>
|
1026
1029
|
<pre>tree = phyloxml[3]</pre>
|
1027
|
-
<p>You can use all Bio::Tree methods on the tree, since PhyloXML::Tree inherits from Bio::Tree. For example
|
1030
|
+
<p>You can use all Bio::Tree methods on the tree, since PhyloXML::Tree inherits from Bio::Tree. For example, </p>
|
1028
1031
|
<pre>tree.leaves.each do |node|
|
1029
1032
|
puts node.name
|
1030
1033
|
end</pre>
|
@@ -1045,7 +1048,7 @@ writer.write(tree1)
|
|
1045
1048
|
# Add another tree to the file
|
1046
1049
|
writer.write(tree2)</pre>
|
1047
1050
|
<h2><a name="label-27" id="label-27">Retrieving data</a></h2><!-- RDLabel: "Retrieving data" -->
|
1048
|
-
<p>Here is an example of how to retrieve the scientific name of the clades.</p>
|
1051
|
+
<p>Here is an example of how to retrieve the scientific name of the clades included in each tree.</p>
|
1049
1052
|
<pre>require 'bio'
|
1050
1053
|
|
1051
1054
|
phyloxml = Bio::PhyloXML::Parser.open('ncbi_taxonomy_mollusca.xml')
|
@@ -1086,7 +1089,7 @@ end
|
|
1086
1089
|
#aggtcgcggcctgtggaagtcctctcct
|
1087
1090
|
#taaatcgc--cccgtgg-agtccc-cct</pre>
|
1088
1091
|
<h2><a name="label-29" id="label-29">The BioRuby example programs</a></h2><!-- RDLabel: "The BioRuby example programs" -->
|
1089
|
-
<p>Some sample programs are stored in ./samples/ directory.
|
1092
|
+
<p>Some sample programs are stored in ./samples/ directory. For example, the n2aa.rb program (transforms a nucleic acid sequence into an amino acid sequence) can be run using:</p>
|
1090
1093
|
<pre>./sample/na2aa.rb test/data/fasta/example1.txt </pre>
|
1091
1094
|
<h2><a name="label-30" id="label-30">Unit testing and doctests</a></h2><!-- RDLabel: "Unit testing and doctests" -->
|
1092
1095
|
<p>BioRuby comes with an extensive testing framework with over 1300 tests and 2700
|
@@ -1098,23 +1101,23 @@ in this tutorial to doctest - more info upcoming.</p>
|
|
1098
1101
|
<h2><a name="label-31" id="label-31">Further reading</a></h2><!-- RDLabel: "Further reading" -->
|
1099
1102
|
<p>See the BioRuby in anger Wiki. A lot of BioRuby's documentation exists in the
|
1100
1103
|
source code and unit tests. To really dive in you will need the latest source
|
1101
|
-
code tree. The embedded rdoc documentation can be viewed online at
|
1104
|
+
code tree. The embedded rdoc documentation for the BioRuby source code can be viewed online at
|
1102
1105
|
<a href="http://bioruby.org/rdoc/"><URL:http://bioruby.org/rdoc/></a>.</p>
|
1103
1106
|
<h2><a name="label-32" id="label-32">BioRuby Shell</a></h2><!-- RDLabel: "BioRuby Shell" -->
|
1104
|
-
<p>The BioRuby shell implementation
|
1107
|
+
<p>The BioRuby shell implementation is located in ./lib/bio/shell. It is very interesting
|
1105
1108
|
as it uses IRB (the Ruby intepreter) which is a powerful environment described in
|
1106
|
-
<a href="http://ruby-doc.org/docs/ProgrammingRuby/html/irb.html">Programming Ruby's
|
1109
|
+
<a href="http://ruby-doc.org/docs/ProgrammingRuby/html/irb.html">Programming Ruby's IRB chapter</a>. IRB commands can be typed directly into the shell, e.g.</p>
|
1107
1110
|
<pre>bioruby!> IRB.conf[:PROMPT_MODE]
|
1108
1111
|
==!> :PROMPT_C</pre>
|
1109
|
-
<p>
|
1112
|
+
<p>Additionally, you also may want to install the optional Ruby readline support -
|
1110
1113
|
with Debian libreadline-ruby. To edit a previous line you may have to press
|
1111
|
-
line down (arrow
|
1114
|
+
line down (down arrow) first.</p>
|
1112
1115
|
<h1><a name="label-33" id="label-33">Helpful tools</a></h1><!-- RDLabel: "Helpful tools" -->
|
1113
1116
|
<p>Apart from rdoc you may also want to use rtags - which allows jumping around
|
1114
1117
|
source code by clicking on class and method names. </p>
|
1115
1118
|
<pre>cd bioruby/lib
|
1116
1119
|
rtags -R --vi</pre>
|
1117
|
-
<p>For a tutorial see <a href="http://rtags.rubyforge.org/"
|
1120
|
+
<p>For a tutorial see <a href="http://rtags.rubyforge.org/">here</a></p>
|
1118
1121
|
<h1><a name="label-34" id="label-34">APPENDIX</a></h1><!-- RDLabel: "APPENDIX" -->
|
1119
1122
|
<h2><a name="label-35" id="label-35">KEGG API</a></h2><!-- RDLabel: "KEGG API" -->
|
1120
1123
|
<p>Please refer to KEGG_API.rd.ja (English version: <a href="http://www.genome.jp/kegg/soap/doc/keggapi_manual.html"><URL:http://www.genome.jp/kegg/soap/doc/keggapi_manual.html></a> ) and</p>
|
@@ -1122,9 +1125,9 @@ rtags -R --vi</pre>
|
|
1122
1125
|
<li><a href="http://www.genome.jp/kegg/soap/"><URL:http://www.genome.jp/kegg/soap/></a></li>
|
1123
1126
|
</ul>
|
1124
1127
|
<h2><a name="label-36" id="label-36">Ruby Ensembl API</a></h2><!-- RDLabel: "Ruby Ensembl API" -->
|
1125
|
-
<p>Ruby Ensembl API is a
|
1128
|
+
<p>The Ruby Ensembl API is a Ruby API to the Ensembl database. It is NOT currently
|
1126
1129
|
included in the BioRuby archives. To install it, see
|
1127
|
-
<a href="http://wiki.github.com/jandot/ruby-ensembl-api"
|
1130
|
+
<a href="http://wiki.github.com/jandot/ruby-ensembl-api">the Ruby-Ensembl Github</a>
|
1128
1131
|
for more information.</p>
|
1129
1132
|
<h3><a name="label-37" id="label-37">Gene Ontology (GO) through the Ruby Ensembl API</a></h3><!-- RDLabel: "Gene Ontology (GO) through the Ruby Ensembl API" -->
|
1130
1133
|
<p>Gene Ontologies can be fetched through the Ruby Ensembl API package:</p>
|
@@ -1134,7 +1137,7 @@ infile = IO.readlines(ARGV.shift) # reading your comma-separated accession mappi
|
|
1134
1137
|
infile.each do |line|
|
1135
1138
|
accs = line.split(",") # Split the comma-sep.entries into an array
|
1136
1139
|
drosphila_acc = accs.shift # the first entry is the Drosophila acc
|
1137
|
-
mosq_acc = accs.shift # the second entry is
|
1140
|
+
mosq_acc = accs.shift # the second entry is your Mosq. acc
|
1138
1141
|
gene = Ensembl::Core::Gene.find_by_stable_id(drosophila_acc)
|
1139
1142
|
print "#{mosq_acc}"
|
1140
1143
|
gene.go_terms.each do |go|
|
@@ -1145,9 +1148,9 @@ end</pre>
|
|
1145
1148
|
homologues.</p>
|
1146
1149
|
<h2><a name="label-38" id="label-38">Using BioPerl or BioPython from Ruby</a></h2><!-- RDLabel: "Using BioPerl or BioPython from Ruby" -->
|
1147
1150
|
<p>At the moment there is no easy way of accessing BioPerl from Ruby. The best way, perhaps, is to create a Perl server that gets accessed through XML/RPC or SOAP.</p>
|
1148
|
-
<h2><a name="label-39" id="label-39">Installing required external
|
1151
|
+
<h2><a name="label-39" id="label-39">Installing required external libraries</a></h2><!-- RDLabel: "Installing required external libraries" -->
|
1149
1152
|
<p>At this point for using BioRuby no additional libraries are needed, except if
|
1150
|
-
you are using Bio::PhyloXML module
|
1153
|
+
you are using the Bio::PhyloXML module; then you have to install libxml-ruby.</p>
|
1151
1154
|
<p>This may change, so keep an eye on the Bioruby website. Also when
|
1152
1155
|
a package is missing BioRuby should show an informative message.</p>
|
1153
1156
|
<p>At this point installing third party Ruby packages can be a bit
|
@@ -1155,16 +1158,16 @@ painful, as the gem standard for packages evolved late and some still
|
|
1155
1158
|
force you to copy things by hand. Therefore read the README's
|
1156
1159
|
carefully that come with each package.</p>
|
1157
1160
|
<h3><a name="label-40" id="label-40">Installing libxml-ruby</a></h3><!-- RDLabel: "Installing libxml-ruby" -->
|
1158
|
-
<p>The simplest way is to use
|
1161
|
+
<p>The simplest way is to use the RubyGems packaging system:</p>
|
1159
1162
|
<pre>gem install -r libxml-ruby</pre>
|
1160
1163
|
<p>If you get `require': no such file to load - mkmf (LoadError) error then do</p>
|
1161
1164
|
<pre>sudo apt-get install ruby-dev</pre>
|
1162
|
-
<p>If you have other problems with installation, then see <a href="http://libxml.rubyforge.org/install.xml"><URL:http://libxml.rubyforge.org/install.xml></a
|
1165
|
+
<p>If you have other problems with installation, then see <a href="http://libxml.rubyforge.org/install.xml"><URL:http://libxml.rubyforge.org/install.xml></a>.</p>
|
1163
1166
|
<h2><a name="label-41" id="label-41">Trouble shooting</a></h2><!-- RDLabel: "Trouble shooting" -->
|
1164
1167
|
<ul>
|
1165
1168
|
<li>Error: in `require': no such file to load -- bio (LoadError)</li>
|
1166
1169
|
</ul>
|
1167
|
-
<p>Ruby
|
1170
|
+
<p>Ruby is failing to find the BioRuby libraries - add it to the RUBYLIB path, or pass
|
1168
1171
|
it to the interpeter. For example:</p>
|
1169
1172
|
<pre>ruby -I$BIORUBYPATH/lib yourprogram.rb</pre>
|
1170
1173
|
<h2><a name="label-42" id="label-42">Modifying this page</a></h2><!-- RDLabel: "Modifying this page" -->
|