bio 1.4.1 → 1.4.2
Sign up to get free protection for your applications and to get access to all the features.
- data/ChangeLog +954 -0
- data/KNOWN_ISSUES.rdoc +40 -5
- data/README.rdoc +36 -35
- data/RELEASE_NOTES.rdoc +87 -59
- data/bioruby.gemspec +24 -2
- data/doc/RELEASE_NOTES-1.4.1.rdoc +104 -0
- data/doc/Tutorial.rd +162 -200
- data/doc/Tutorial.rd.html +149 -146
- data/lib/bio.rb +1 -0
- data/lib/bio/appl/blast.rb +1 -1
- data/lib/bio/appl/blast/ddbj.rb +26 -34
- data/lib/bio/appl/blast/genomenet.rb +21 -11
- data/lib/bio/db/embl/sptr.rb +193 -21
- data/lib/bio/db/fasta.rb +1 -1
- data/lib/bio/db/fastq.rb +14 -0
- data/lib/bio/db/fastq/format_fastq.rb +2 -2
- data/lib/bio/db/genbank/ddbj.rb +1 -2
- data/lib/bio/db/genbank/format_genbank.rb +1 -1
- data/lib/bio/db/medline.rb +1 -0
- data/lib/bio/db/newick.rb +3 -1
- data/lib/bio/db/pdb/pdb.rb +9 -9
- data/lib/bio/db/pdb/residue.rb +2 -2
- data/lib/bio/io/ddbjrest.rb +344 -0
- data/lib/bio/io/ncbirest.rb +121 -1
- data/lib/bio/location.rb +2 -2
- data/lib/bio/reference.rb +3 -4
- data/lib/bio/shell/plugin/entry.rb +7 -3
- data/lib/bio/shell/plugin/ncbirest.rb +5 -1
- data/lib/bio/util/restriction_enzyme.rb +3 -0
- data/lib/bio/util/restriction_enzyme/dense_int_array.rb +195 -0
- data/lib/bio/util/restriction_enzyme/range/sequence_range.rb +7 -7
- data/lib/bio/util/restriction_enzyme/range/sequence_range/calculated_cuts.rb +57 -18
- data/lib/bio/util/restriction_enzyme/range/sequence_range/fragment.rb +2 -2
- data/lib/bio/util/restriction_enzyme/sorted_num_array.rb +219 -0
- data/lib/bio/version.rb +1 -1
- data/sample/test_restriction_enzyme_long.rb +4403 -0
- data/test/data/fasta/EFTU_BACSU.fasta +8 -0
- data/test/data/genbank/CAA35997.gp +48 -0
- data/test/data/genbank/SCU49845.gb +167 -0
- data/test/data/litdb/1717226.litdb +13 -0
- data/test/data/pir/CRAB_ANAPL.pir +6 -0
- data/test/functional/bio/appl/blast/test_remote.rb +93 -0
- data/test/functional/bio/appl/test_blast.rb +61 -0
- data/test/functional/bio/io/test_ddbjrest.rb +47 -0
- data/test/functional/bio/test_command.rb +3 -3
- data/test/unit/bio/db/embl/test_sptr.rb +6 -6
- data/test/unit/bio/db/embl/test_uniprot_new_part.rb +208 -0
- data/test/unit/bio/db/genbank/test_common.rb +274 -0
- data/test/unit/bio/db/genbank/test_genbank.rb +401 -0
- data/test/unit/bio/db/genbank/test_genpept.rb +81 -0
- data/test/unit/bio/db/pdb/test_pdb.rb +3287 -11
- data/test/unit/bio/db/test_fasta.rb +34 -12
- data/test/unit/bio/db/test_fastq.rb +26 -0
- data/test/unit/bio/db/test_litdb.rb +95 -0
- data/test/unit/bio/db/test_medline.rb +1 -0
- data/test/unit/bio/db/test_nbrf.rb +82 -0
- data/test/unit/bio/db/test_newick.rb +22 -4
- data/test/unit/bio/test_reference.rb +35 -0
- data/test/unit/bio/util/restriction_enzyme/test_dense_int_array.rb +201 -0
- data/test/unit/bio/util/restriction_enzyme/test_sorted_num_array.rb +281 -0
- metadata +44 -38
@@ -0,0 +1,104 @@
|
|
1
|
+
= BioRuby 1.4.1 RELEASE NOTES
|
2
|
+
|
3
|
+
A lot of changes have been made to the BioRuby 1.4.1 after the version 1.4.0
|
4
|
+
is released. This document describes important and/or incompatible changes
|
5
|
+
since the BioRuby 1.4.0 release.
|
6
|
+
|
7
|
+
For known problems, see KNOWN_ISSUES.rdoc.
|
8
|
+
|
9
|
+
== New features
|
10
|
+
|
11
|
+
=== PAML Codeml support is significantly improved
|
12
|
+
|
13
|
+
PAML Codeml result parser is completely rewritten and is significantly
|
14
|
+
improved. The code is developed by Pjotr Prins.
|
15
|
+
|
16
|
+
=== KEGG PATHWAY and KEGG MODULE parser
|
17
|
+
|
18
|
+
Parsers for KEGG PATHWAY and KEGG MODULE data are added. The code is developed
|
19
|
+
by Kozo Nishida and Toshiaki Katayama.
|
20
|
+
|
21
|
+
=== Bio::KEGG improvements
|
22
|
+
|
23
|
+
Following new methods are added.
|
24
|
+
|
25
|
+
* Bio::KEGG::GENES#keggclass, keggclasses, names_as_array, names,
|
26
|
+
motifs_as_strings, motifs_as_hash, motifs
|
27
|
+
* Bio::KEGG::GENOME#original_databases
|
28
|
+
|
29
|
+
=== Test codes are added and improved.
|
30
|
+
|
31
|
+
Test codes are added and improved. Tney are developed by Kazuhiro Hayashi,
|
32
|
+
Kozo Nishida, John Prince, and Naohisa Goto.
|
33
|
+
|
34
|
+
=== Other new methods
|
35
|
+
|
36
|
+
* Bio::Fastq#mask
|
37
|
+
* Bio::Sequence#output_fasta
|
38
|
+
* Bio::ClustalW::Report#get_sequence
|
39
|
+
* Bio::Reference#==
|
40
|
+
* Bio::Location#==
|
41
|
+
* Bio::Locations#==
|
42
|
+
* Bio::FastaNumericFormat#to_biosequence
|
43
|
+
|
44
|
+
== Bug fixes
|
45
|
+
|
46
|
+
=== Bio::Tree
|
47
|
+
|
48
|
+
Following methods did not work correctly.
|
49
|
+
|
50
|
+
* Bio::Tree#collect_edge!
|
51
|
+
* Bio::Tree#remove_edge_if
|
52
|
+
|
53
|
+
=== Bio::KEGG::GENES and Bio::KEGG::GENOME
|
54
|
+
|
55
|
+
* Fixed bugs in Bio::KEGG::GENES#pathway.
|
56
|
+
* Fixed parser errors due to the format changes of KEGG GENES and KEGG GENOME.
|
57
|
+
|
58
|
+
=== Other bug fixes
|
59
|
+
|
60
|
+
* In Bio::Command, changed not to call fork(2) on platforms that do not
|
61
|
+
support it.
|
62
|
+
* Bio::MEDLINE#initialize should handle continuation of lines.
|
63
|
+
* Typo and a missing field in Bio::GO::GeneAssociation#to_str.
|
64
|
+
* Bug fix of Bio::FastaNumericFormat#to_biosequence.
|
65
|
+
* Fixed UniProt GN parsing issue in Bio::SPTR.
|
66
|
+
|
67
|
+
== Incompatible changes
|
68
|
+
|
69
|
+
=== Bio::PAML::Codeml::Report
|
70
|
+
|
71
|
+
The code is completely rewritten. See the RDoc for details.
|
72
|
+
|
73
|
+
=== Bio::KEGG::ORTHOLOGY
|
74
|
+
|
75
|
+
Bio::KEGG::ORTHOLOGY#pathways is changed to return a hash. The old pathway
|
76
|
+
method is renamed to pathways_in_keggclass for compatibility.
|
77
|
+
|
78
|
+
=== Bio::AAindex2
|
79
|
+
|
80
|
+
Bio::AAindex2 now copies each symmetric element for lower triangular matrix
|
81
|
+
to the upper right part, because the Matrix class in Ruby 1.9.2 no longer
|
82
|
+
accepts any dimension mismatches. We think the previous behavior is a bug.
|
83
|
+
|
84
|
+
=== Bio::MEDLINE
|
85
|
+
|
86
|
+
Bio::MEDLINE#reference no longer puts empty values in the returned
|
87
|
+
Bio::Reference object. We think the previous behavior is a bug.
|
88
|
+
We also think the effect is very small.
|
89
|
+
|
90
|
+
== Known issues
|
91
|
+
|
92
|
+
The following issues are added or updated. See KNOWN_ISSUES.rdoc for other
|
93
|
+
already known issues.
|
94
|
+
|
95
|
+
=== String escaping of command-line arguments in Ruby 1.9.X on Windows
|
96
|
+
|
97
|
+
After BioRuby 1.4.1, in Ruby 1.9.X running on Windows, escaping of
|
98
|
+
command-line arguments are processed by the Ruby interpreter. Before BioRuby
|
99
|
+
1.4.0, the escaping is executed in Bio::Command#escape_shell_windows, and
|
100
|
+
the behavior is different from the Ruby interpreter's one.
|
101
|
+
|
102
|
+
Curreltly, due to the change, test/functional/bio/test_command.rb may fail
|
103
|
+
on Windows with Ruby 1.9.X.
|
104
|
+
|
data/doc/Tutorial.rd
CHANGED
@@ -30,7 +30,7 @@
|
|
30
30
|
#
|
31
31
|
#
|
32
32
|
|
33
|
-
bioruby> $: << '../lib'
|
33
|
+
bioruby> $: << '../lib' # make sure rubydoctest finds bioruby/lib
|
34
34
|
|
35
35
|
=begin
|
36
36
|
#doctest Testing bioruby
|
@@ -38,19 +38,19 @@ bioruby> $: << '../lib'
|
|
38
38
|
= BioRuby Tutorial
|
39
39
|
|
40
40
|
* Copyright (C) 2001-2003 KATAYAMA Toshiaki <k .at. bioruby.org>
|
41
|
-
* Copyright (C) 2005-
|
41
|
+
* Copyright (C) 2005-2011 Pjotr Prins, Naohisa Goto and others
|
42
42
|
|
43
|
-
This document was last modified:
|
44
|
-
Current editor:
|
43
|
+
This document was last modified: 2011/03/24
|
44
|
+
Current editor: Michael O'Keefe <okeefm (at) rpi (dot) edu>
|
45
45
|
|
46
|
-
The latest version resides in the GIT source code repository: ./doc/((<Tutorial.rd|URL:
|
46
|
+
The latest version resides in the GIT source code repository: ./doc/((<Tutorial.rd|URL:https://github.com/bioruby/bioruby/blob/master/doc/Tutorial.rd>)).
|
47
47
|
|
48
48
|
== Introduction
|
49
49
|
|
50
50
|
This is a tutorial for using Bioruby. A basic knowledge of Ruby is required.
|
51
|
-
If you want to know more about the programming
|
51
|
+
If you want to know more about the programming language, we recommend the
|
52
52
|
latest Ruby book ((<Programming Ruby|URL:http://www.pragprog.com/titles/ruby>))
|
53
|
-
by Dave Thomas and Andy Hunt - the first edition
|
53
|
+
by Dave Thomas and Andy Hunt - the first edition can be read online
|
54
54
|
((<here|URL:http://www.ruby-doc.org/docs/ProgrammingRuby/>)).
|
55
55
|
|
56
56
|
For BioRuby you need to install Ruby and the BioRuby package on your computer
|
@@ -60,7 +60,7 @@ version it has with the
|
|
60
60
|
|
61
61
|
% ruby -v
|
62
62
|
|
63
|
-
command.
|
63
|
+
command. You should see something like:
|
64
64
|
|
65
65
|
ruby 1.8.7 (2008-08-11 patchlevel 72) [i486-linux]
|
66
66
|
|
@@ -70,7 +70,7 @@ manager. For more information see the
|
|
70
70
|
|
71
71
|
With Ruby download and install Bioruby using the links on the
|
72
72
|
((<Bioruby|URL:http://bioruby.org/>)) website. The recommended installation is via
|
73
|
-
|
73
|
+
RubyGems:
|
74
74
|
|
75
75
|
gem install bio
|
76
76
|
|
@@ -83,10 +83,13 @@ documentation can be viewed online at
|
|
83
83
|
|
84
84
|
== Trying Bioruby
|
85
85
|
|
86
|
-
Bioruby comes with its own shell. After unpacking the sources run the
|
87
|
-
following command
|
86
|
+
Bioruby comes with its own shell. After unpacking the sources run one of the following commands:
|
88
87
|
|
89
|
-
|
88
|
+
bioruby
|
89
|
+
|
90
|
+
or, from the source tree
|
91
|
+
|
92
|
+
cd bioruby
|
90
93
|
ruby -I lib bin/bioruby
|
91
94
|
|
92
95
|
and you should see a prompt
|
@@ -110,11 +113,11 @@ question to the mailing list. BioRuby developers usually try to help.
|
|
110
113
|
|
111
114
|
The Bio::Sequence class allows the usual sequence transformations and
|
112
115
|
translations. In the example below the DNA sequence "atgcatgcaaaa" is
|
113
|
-
converted into the complemental strand
|
114
|
-
next the nucleic acid composition is calculated and the sequence is
|
116
|
+
converted into the complemental strand and spliced into a subsequence;
|
117
|
+
next, the nucleic acid composition is calculated and the sequence is
|
115
118
|
translated into the amino acid sequence, the molecular weight
|
116
|
-
calculated, and so on. When translating into amino acid sequences the
|
117
|
-
frame can be specified and optionally the
|
119
|
+
calculated, and so on. When translating into amino acid sequences, the
|
120
|
+
frame can be specified and optionally the codon table selected (as
|
118
121
|
defined in codontable.rb).
|
119
122
|
|
120
123
|
bioruby> seq = Bio::Sequence::NA.new("atgcatgcaaaa")
|
@@ -124,7 +127,7 @@ defined in codontable.rb).
|
|
124
127
|
bioruby> seq.complement
|
125
128
|
==> "ttttgcatgcat"
|
126
129
|
|
127
|
-
bioruby> seq.subseq(3,8) # gets subsequence of positions 3 to 8
|
130
|
+
bioruby> seq.subseq(3,8) # gets subsequence of positions 3 to 8 (starting from 1)
|
128
131
|
==> "gcatgc"
|
129
132
|
bioruby> seq.gc_percent
|
130
133
|
==> 33
|
@@ -169,11 +172,11 @@ Windows). For example
|
|
169
172
|
% ri p
|
170
173
|
% ri File.open
|
171
174
|
|
172
|
-
Nucleic acid sequence
|
173
|
-
amino acid sequence
|
175
|
+
Nucleic acid sequence are members of the Bio::Sequence::NA class, and
|
176
|
+
amino acid sequence are members of the Bio::Sequence::AA class. Shared
|
174
177
|
methods are in the parent Bio::Sequence class.
|
175
178
|
|
176
|
-
As Bio::Sequence
|
179
|
+
As Bio::Sequence inherits Ruby's String class, you can use
|
177
180
|
String class methods. For example, to get a subsequence, you can
|
178
181
|
not only use subseq(from, to) but also String#[].
|
179
182
|
|
@@ -189,15 +192,14 @@ has index 0, for example:
|
|
189
192
|
|
190
193
|
So when using String methods, you should subtract 1 from positions
|
191
194
|
conventionally used in biology. (subseq method will throw an exception if you
|
192
|
-
specify positions smaller than or equal to 0 for either one of the "from" or
|
193
|
-
"to".)
|
195
|
+
specify positions smaller than or equal to 0 for either one of the "from" or "to".)
|
194
196
|
|
195
197
|
The window_search(window_size, step_size) method shows a typical Ruby
|
196
198
|
way of writing concise and clear code using 'closures'. Each sliding
|
197
199
|
window creates a subsequence which is supplied to the enclosed block
|
198
200
|
through a variable named +s+.
|
199
201
|
|
200
|
-
* Show average percentage of GC content for 20 bases (stepping the default one base at a time)
|
202
|
+
* Show average percentage of GC content for 20 bases (stepping the default one base at a time):
|
201
203
|
|
202
204
|
bioruby> seq = Bio::Sequence::NA.new("atgcatgcaattaagctaatcccaattagatcatcccgatcatcaaaaaaaaaa")
|
203
205
|
==> "atgcatgcaattaagctaatcccaattagatcatcccgatcatcaaaaaaaaaa"
|
@@ -268,8 +270,8 @@ For example:
|
|
268
270
|
|
269
271
|
puts my_aaseq
|
270
272
|
|
271
|
-
Save the program as na2aa.rb. Prepare a nucleic acid sequence
|
272
|
-
described below and
|
273
|
+
Save the program above as na2aa.rb. Prepare a nucleic acid sequence
|
274
|
+
described below and save it as my_naseq.txt:
|
273
275
|
|
274
276
|
gtggcgatctttccgaaagcgatgactggagcgaagaaccaaagcagtgacatttgtctg
|
275
277
|
atgccgcacgtaggcctgataagacgcggacagcgtcgcatcaggcatcttgtgcaaatg
|
@@ -288,7 +290,7 @@ Outputs
|
|
288
290
|
|
289
291
|
VAIFPKAMTGAKNQSSDICLMPHVGLIRRGQRRIRHLVQMSDAA*
|
290
292
|
|
291
|
-
You can also write this, a bit
|
293
|
+
You can also write this, a bit fancifully, as a one-liner script.
|
292
294
|
|
293
295
|
% ruby -r bio -e 'p Bio::Sequence::NA.new($<.read).translate' my_naseq.txt
|
294
296
|
|
@@ -301,7 +303,7 @@ sequence files. One generic example of the above can be found in
|
|
301
303
|
We assume that you already have some GenBank data files. (If you don't,
|
302
304
|
download some .seq files from ftp://ftp.ncbi.nih.gov/genbank/)
|
303
305
|
|
304
|
-
As an example we fetch the ID, definition and sequence of each entry
|
306
|
+
As an example we will fetch the ID, definition and sequence of each entry
|
305
307
|
from the GenBank format and convert it to FASTA. This is also an example
|
306
308
|
script in the BioRuby distribution.
|
307
309
|
|
@@ -349,7 +351,7 @@ For example, in turn, reading FASTA format files:
|
|
349
351
|
puts "naseq : " + f.naseq
|
350
352
|
end
|
351
353
|
|
352
|
-
In above two scripts, the first arguments of Bio::FlatFile.new are
|
354
|
+
In the above two scripts, the first arguments of Bio::FlatFile.new are
|
353
355
|
database classes of BioRuby. This is expanded on in a later section.
|
354
356
|
|
355
357
|
Again another option is to use the Bio::DB.open class:
|
@@ -408,12 +410,9 @@ very complicated:
|
|
408
410
|
|
409
411
|
* Note: In this example Feature#assoc method makes a Hash from a
|
410
412
|
feature object. It is useful because you can get data from the hash
|
411
|
-
by using qualifiers as keys.
|
412
|
-
(But there is a risk some information is lost when two or more
|
413
|
-
qualifiers are the same. Therefore an Array is returned by
|
414
|
-
Feature#feature)
|
413
|
+
by using qualifiers as keys. But there is a risk some information is lost when two or more qualifiers are the same. Therefore an Array is returned by Feature#feature.
|
415
414
|
|
416
|
-
Bio::Sequence#splicing splices
|
415
|
+
Bio::Sequence#splicing splices subsequences from nucleic acid sequences
|
417
416
|
according to location information used in GenBank, EMBL and DDBJ.
|
418
417
|
|
419
418
|
When the specified translation table is different from the default
|
@@ -434,7 +433,7 @@ bio/location.rb.
|
|
434
433
|
locs = Bio::Locations.new('join((8298.8300)..10206,1..855)')
|
435
434
|
naseq.splicing(locs)
|
436
435
|
|
437
|
-
You can also use
|
436
|
+
You can also use this splicing method for amino acid sequences
|
438
437
|
(Bio::Sequence::AA objects).
|
439
438
|
|
440
439
|
* Splicing peptide from a protein (e.g. signal peptide)
|
@@ -450,11 +449,7 @@ the ./lib/bio/db directory of the BioRuby source tree.
|
|
450
449
|
|
451
450
|
In many cases the Bio::DatabaseClass acts as a factory pattern
|
452
451
|
and recognises the database type automatically - returning a
|
453
|
-
parsed object. For example using Bio::FlatFile
|
454
|
-
|
455
|
-
Bio::FlatFile class as described above. The first argument of the
|
456
|
-
Bio::FlatFile.new is database class name in BioRuby (such as Bio::GenBank,
|
457
|
-
Bio::KEGG::GENES and so on).
|
452
|
+
parsed object. For example using Bio::FlatFile class as described above. The first argument of the Bio::FlatFile.new is database class name in BioRuby (such as Bio::GenBank, Bio::KEGG::GENES and so on).
|
458
453
|
|
459
454
|
ff = Bio::FlatFile.new(Bio::DatabaseClass, ARGF)
|
460
455
|
|
@@ -472,19 +467,18 @@ database class?
|
|
472
467
|
p entry.seq # sequence data of the entry
|
473
468
|
end
|
474
469
|
|
475
|
-
An example that can take any input, filter using a regular expression
|
470
|
+
An example that can take any input, filter using a regular expression and output
|
476
471
|
to a FASTA file can be found in sample/any2fasta.rb. With this technique it is
|
477
472
|
possible to write a Unix type grep/sort pipe for sequence information. One
|
478
473
|
example using scripts in the BIORUBY sample folder:
|
479
474
|
|
480
475
|
fastagrep.rb '/At|Dm/' database.seq | fastasort.rb
|
481
476
|
|
482
|
-
greps the database for Arabidopsis and Drosophila entries and sorts the output
|
483
|
-
to FASTA.
|
477
|
+
greps the database for Arabidopsis and Drosophila entries and sorts the output to FASTA.
|
484
478
|
|
485
479
|
Other methods to extract specific data from database objects can be
|
486
480
|
different between databases, though some methods are common (see the
|
487
|
-
guidelines for common methods
|
481
|
+
guidelines for common methods in bio/db.rb).
|
488
482
|
|
489
483
|
* entry_id --> gets ID of the entry
|
490
484
|
* definition --> gets definition of the entry
|
@@ -495,16 +489,15 @@ guidelines for common methods as described in bio/db.rb).
|
|
495
489
|
Refer to the documents of each database to find the exact naming
|
496
490
|
of the included methods.
|
497
491
|
|
498
|
-
In
|
499
|
-
name is plural the method returns some object as an Array. For
|
492
|
+
In general, BioRuby uses the following conventions: when a method
|
493
|
+
name is plural, the method returns some object as an Array. For
|
500
494
|
example, some classes have a "references" method which returns
|
501
495
|
multiple Bio::Reference objects as an Array. And some classes have a
|
502
496
|
"reference" method which returns a single Bio::Reference object.
|
503
497
|
|
504
498
|
=== Alignments (Bio::Alignment)
|
505
499
|
|
506
|
-
Bio::Alignment class in bio/alignment.rb is a container class like Ruby's Hash
|
507
|
-
Array and BioPerl's Bio::SimpleAlign. A very simple example is:
|
500
|
+
The Bio::Alignment class in bio/alignment.rb is a container class like Ruby's Hash and Array classes and BioPerl's Bio::SimpleAlign. A very simple example is:
|
508
501
|
|
509
502
|
bioruby> seqs = [ 'atgca', 'aagca', 'acgca', 'acgcg' ]
|
510
503
|
bioruby> seqs = seqs.collect{ |x| Bio::Sequence::NA.new(x) }
|
@@ -536,18 +529,45 @@ Array and BioPerl's Bio::SimpleAlign. A very simple example is:
|
|
536
529
|
factory = Bio::ClustalW.new
|
537
530
|
a2 = a.do_align(factory)
|
538
531
|
|
532
|
+
Read a ClustalW or Muscle 'ALN' alignment file:
|
533
|
+
|
534
|
+
bioruby> aln = Bio::ClustalW::Report.new(File.read('../test/data/clustalw/example1.aln'))
|
535
|
+
bioruby> aln.header
|
536
|
+
==> "CLUSTAL 2.0.9 multiple sequence alignment"
|
537
|
+
|
538
|
+
Fetch a sequence:
|
539
|
+
|
540
|
+
bioruby> seq = aln.get_sequence(1)
|
541
|
+
bioruby> seq.definition
|
542
|
+
==> "gi|115023|sp|P10425|"
|
543
|
+
|
544
|
+
Get a partial sequence:
|
545
|
+
|
546
|
+
bioruby> seq.to_s[60..120]
|
547
|
+
==> "LGYFNG-EAVPSNGLVLNTSKGLVLVDSSWDNKLTKELIEMVEKKFQKRVTDVIITHAHAD"
|
548
|
+
|
549
|
+
Show the full alignment residue match information for the sequences in the set:
|
550
|
+
|
551
|
+
bioruby> aln.match_line[60..120]
|
552
|
+
==> " . **. . .. ::*: . * : : . .: .* * *"
|
553
|
+
|
554
|
+
Return a Bio::Alignment object:
|
555
|
+
|
556
|
+
bioruby> aln.alignment.consensus[60..120]
|
557
|
+
==> "???????????SN?????????????D??????????L??????????????????H?H?D"
|
558
|
+
|
539
559
|
== Restriction Enzymes (Bio::RE)
|
540
560
|
|
541
561
|
BioRuby has extensive support for restriction enzymes (REs). It contains a full
|
542
562
|
library of commonly used REs (from REBASE) which can be used to cut single
|
543
|
-
stranded RNA or
|
563
|
+
stranded RNA or double stranded DNA into fragments. To list all enzymes:
|
544
564
|
|
545
565
|
rebase = Bio::RestrictionEnzyme.rebase
|
546
566
|
rebase.each do |enzyme_name, info|
|
547
567
|
p enzyme_name
|
548
568
|
end
|
549
569
|
|
550
|
-
and cut a sequence with an enzyme follow up with:
|
570
|
+
and to cut a sequence with an enzyme follow up with:
|
551
571
|
|
552
572
|
res = seq.cut_with_enzyme('EcoRII', {:max_permutations => 0},
|
553
573
|
{:view_ranges => true})
|
@@ -577,13 +597,14 @@ and cut a sequence with an enzyme follow up with:
|
|
577
597
|
Let's start with a query.pep file which contains a sequence in FASTA
|
578
598
|
format. In this example we are going to execute a homology search
|
579
599
|
from a remote internet site or on your local machine. Note that you
|
580
|
-
can use the ssearch program instead of fasta when you use
|
600
|
+
can use the ssearch program instead of fasta when you use it in your
|
581
601
|
local machine.
|
582
602
|
|
583
603
|
=== using FASTA in local machine
|
584
604
|
|
585
605
|
Install the fasta program on your machine (the command name looks like
|
586
606
|
fasta34. FASTA can be downloaded from ftp://ftp.virginia.edu/pub/fasta/).
|
607
|
+
|
587
608
|
First, you must prepare your FASTA-formatted database sequence file
|
588
609
|
target.pep and FASTA-formatted query.pep.
|
589
610
|
|
@@ -619,7 +640,7 @@ target.pep and FASTA-formatted query.pep.
|
|
619
640
|
end
|
620
641
|
end
|
621
642
|
|
622
|
-
We named above script
|
643
|
+
We named above script f_search.rb. You can execute it as follows:
|
623
644
|
|
624
645
|
% ./f_search.rb query.pep target.pep > f_search.out
|
625
646
|
|
@@ -630,14 +651,13 @@ Bio::Sequence#fasta method can be used.
|
|
630
651
|
seq = ">test seq\nYQVLEEIGRGSFGSVRKVIHIPTKKLLVRKDIKYGHMNSKE"
|
631
652
|
seq.fasta(factory)
|
632
653
|
|
633
|
-
When you want to add options to FASTA
|
634
|
-
third argument of Bio::Fasta.local method. For example,
|
635
|
-
and getting top-10 hits:
|
654
|
+
When you want to add options to FASTA commands, you can set the
|
655
|
+
third argument of the Bio::Fasta.local method. For example, the following sets ktup to 1 and gets a list of the top 10 hits:
|
636
656
|
|
637
657
|
factory = Bio::Fasta.local('fasta34', 'target.pep', '-b 10')
|
638
658
|
factory.ktup = 1
|
639
659
|
|
640
|
-
Bio::Fasta#query returns Bio::Fasta::Report object.
|
660
|
+
Bio::Fasta#query returns a Bio::Fasta::Report object.
|
641
661
|
We can get almost all information described in FASTA report text
|
642
662
|
with the Report object. For example, getting information for hits:
|
643
663
|
|
@@ -665,12 +685,11 @@ with the Report object. For example, getting information for hits:
|
|
665
685
|
puts hit.lap_at # array of above four numbers
|
666
686
|
end
|
667
687
|
|
668
|
-
Most of above methods are common
|
669
|
-
below. Please refer to
|
688
|
+
Most of above methods are common to the Bio::Blast::Report described
|
689
|
+
below. Please refer to the documentation of the Bio::Fasta::Report class for
|
670
690
|
FASTA-specific details.
|
671
691
|
|
672
|
-
If you need original output text of FASTA program you can use the "output"
|
673
|
-
method of the factory object after the "query" method.
|
692
|
+
If you need the original output text of FASTA program you can use the "output" method of the factory object after the "query" method.
|
674
693
|
|
675
694
|
report = factory.query(entry)
|
676
695
|
puts factory.output
|
@@ -698,15 +717,15 @@ Available databases in GenomeNet:
|
|
698
717
|
Select the databases you require. Next, give the search program from
|
699
718
|
the type of query sequence and database.
|
700
719
|
|
701
|
-
* When query is
|
720
|
+
* When query is an amino acid sequence
|
702
721
|
* When protein database, program is "fasta".
|
703
722
|
* When nucleic database, program is "tfasta".
|
704
723
|
|
705
724
|
* When query is a nucleic acid sequence
|
706
725
|
* When nucleic database, program is "fasta".
|
707
|
-
* (When protein database,
|
726
|
+
* (When protein database, the search would fail.)
|
708
727
|
|
709
|
-
For example:
|
728
|
+
For example, run:
|
710
729
|
|
711
730
|
program = 'fasta'
|
712
731
|
database = 'genes'
|
@@ -741,7 +760,7 @@ The parameter "program" is different from FASTA - as you can expect:
|
|
741
760
|
Bio::BLAST uses "-m 7" XML output of BLAST by default when either
|
742
761
|
XMLParser or REXML (both of them are XML parser libraries for Ruby -
|
743
762
|
of the two XMLParser is the fastest) is installed on your computer. In
|
744
|
-
Ruby version 1.8.0
|
763
|
+
Ruby version 1.8.0 or later, REXML is bundled with Ruby's
|
745
764
|
distribution.
|
746
765
|
|
747
766
|
When no XML parser library is present, Bio::BLAST uses "-m 8" tabular
|
@@ -776,10 +795,10 @@ midline.
|
|
776
795
|
end
|
777
796
|
|
778
797
|
For simplicity and API compatibility, some information such as score
|
779
|
-
|
798
|
+
is extracted from the first Hsp (High-scoring Segment Pair).
|
780
799
|
|
781
800
|
Check the documentation for Bio::Blast::Report to see what can be
|
782
|
-
retrieved. For now suffice to
|
801
|
+
retrieved. For now suffice to say that Bio::Blast::Report has a
|
783
802
|
hierarchical structure mirroring the general BLAST output stream:
|
784
803
|
|
785
804
|
* In a Bio::Blast::Report object, @iterations is an array of
|
@@ -854,65 +873,12 @@ Bio::Blast::Report.new(or Bio::Blast::Default::Report.new):
|
|
854
873
|
|
855
874
|
factory = Bio::Blast.remote(program, db, option, 'MYSITE')
|
856
875
|
|
857
|
-
When you write above routines, please send to the BioRuby project and
|
858
|
-
they may be included.
|
876
|
+
When you write above routines, please send them to the BioRuby project, and they may be included in future releases.
|
859
877
|
|
860
878
|
== Generate a reference list using PubMed (Bio::PubMed)
|
861
|
-
=end
|
862
|
-
(EDITORs NOTE: examples in this section do not work and should be rewritten.)
|
863
|
-
|
864
|
-
Below script is an example which seaches PubMed and creates a reference list.
|
865
|
-
|
866
|
-
ARGV.each do |id|
|
867
|
-
entry = Bio::PubMed.query(id) # searches PubMed and get entry
|
868
|
-
medline = Bio::MEDLINE.new(entry) # creates Bio::MEDLINE object from entry text
|
869
|
-
reference = medline.reference # converts into Bio::Reference object
|
870
|
-
puts reference.bibtex # shows BibTeX formatted text
|
871
|
-
end
|
872
|
-
|
873
|
-
We named the script pmfetch.rb.
|
874
|
-
|
875
|
-
% ./pmfetch.rb 11024183 10592278 10592173
|
876
|
-
|
877
|
-
To give some PubMed ID (PMID) in arguments, the script retrieves informations
|
878
|
-
from NCBI, parses MEDLINE format text, converts into BibTeX format and
|
879
|
-
shows them.
|
880
|
-
|
881
|
-
A keyword search is also available.
|
882
|
-
|
883
|
-
#!/usr/bin/env ruby
|
884
|
-
|
885
|
-
require 'bio'
|
886
|
-
|
887
|
-
# Concatinates argument keyword list to a string
|
888
|
-
keywords = ARGV.join(' ')
|
889
|
-
|
890
|
-
# PubMed keyword search
|
891
|
-
entries = Bio::PubMed.search(keywords)
|
892
|
-
|
893
|
-
entries.each do |entry|
|
894
|
-
medline = Bio::MEDLINE.new(entry) # creates Bio::MEDLINE object from text
|
895
|
-
reference = medline.reference # converts into Bio::Reference object
|
896
|
-
puts reference.bibtex # shows BibTeX format text
|
897
|
-
end
|
898
|
-
|
899
|
-
We named the script pmsearch.rb.
|
900
|
-
|
901
|
-
% ./pmsearch.rb genome bioinformatics
|
902
|
-
|
903
|
-
To give keywords in arguments, the script searches PubMed by given
|
904
|
-
keywords and shows bibliography informations in a BibTex format. Other
|
905
|
-
output formats are also avaialble like the bibitem method described
|
906
|
-
below. Some journal formats like nature and nar can be used, but lack
|
907
|
-
bold and italic font output.
|
908
|
-
|
909
|
-
(EDITORs NOTE: do we have some simple object that can be queried for
|
910
|
-
author, title etc.?)
|
911
|
-
=begin
|
912
879
|
|
913
880
|
Nowadays using NCBI E-Utils is recommended. Use Bio::PubMed.esearch
|
914
|
-
and Bio::PubMed.efetch
|
915
|
-
|
881
|
+
and Bio::PubMed.efetch.
|
916
882
|
|
917
883
|
#!/usr/bin/env ruby
|
918
884
|
|
@@ -959,7 +925,7 @@ BibTeX format bibliography data to a file named genoinfo.bib.
|
|
959
925
|
|
960
926
|
The BibTeX can be used with Tex or LaTeX to form bibliography
|
961
927
|
information with your journal article. For more information
|
962
|
-
on BibTex see (
|
928
|
+
on using BibTex see ((<BibTex HowTo site|URL:http://www.bibtex.org/Using/>)). A quick example:
|
963
929
|
|
964
930
|
Save this to hoge.tex:
|
965
931
|
|
@@ -977,14 +943,13 @@ Then,
|
|
977
943
|
% latex hoge # creates bibliography list
|
978
944
|
% latex hoge # inserts correct bibliography reference
|
979
945
|
|
980
|
-
Now, you get hoge.dvi and hoge.ps - the latter
|
981
|
-
Postscript viewer.
|
946
|
+
Now, you get hoge.dvi and hoge.ps - the latter of which can be viewed with any Postscript viewer.
|
982
947
|
|
983
948
|
=== Bio::Reference#bibitem
|
984
949
|
|
985
950
|
When you don't want to create a bib file, you can use
|
986
951
|
Bio::Reference#bibitem method instead of Bio::Reference#bibtex.
|
987
|
-
In above pmfetch.rb and pmsearch.rb scripts, change
|
952
|
+
In the above pmfetch.rb and pmsearch.rb scripts, change
|
988
953
|
|
989
954
|
puts reference.bibtex
|
990
955
|
to
|
@@ -1031,11 +996,11 @@ BioRuby and other projects' members (2002).
|
|
1031
996
|
* Server-client model for getting entry from database via http.
|
1032
997
|
|
1033
998
|
* BioSQL
|
1034
|
-
* Schemas to store sequence data to relational
|
999
|
+
* Schemas to store sequence data to relational databases such as
|
1035
1000
|
MySQL and PostgreSQL, and methods to retrieve entries from the database.
|
1036
1001
|
|
1037
|
-
|
1038
|
-
((<URL:http://obda.open-bio.org
|
1002
|
+
This tutorial only gives a quick overview of OBDA. Check out
|
1003
|
+
((<the OBDA site|URL:http://obda.open-bio.org>)) for more extensive details.
|
1039
1004
|
|
1040
1005
|
== BioRegistry
|
1041
1006
|
|
@@ -1053,17 +1018,17 @@ when all local configulation files are not available.
|
|
1053
1018
|
In the current BioRuby implementation all local configulation files
|
1054
1019
|
are read. For databases with the same name settings encountered first
|
1055
1020
|
are used. This means that if you don't like some settings of a
|
1056
|
-
database in system global configuration file
|
1057
|
-
(/etc/bioinformatics/seqdatabase.ini), you can easily override
|
1021
|
+
database in the system's global configuration file
|
1022
|
+
(/etc/bioinformatics/seqdatabase.ini), you can easily override them by
|
1058
1023
|
writing settings to ~/.bioinformatics/seqdatabase.ini.
|
1059
1024
|
|
1060
1025
|
The syntax of the configuration file is called a stanza format. For example
|
1061
1026
|
|
1062
1027
|
[DatabaseName]
|
1063
1028
|
protocol=ProtocolName
|
1064
|
-
location=
|
1029
|
+
location=ServerName
|
1065
1030
|
|
1066
|
-
You can write a description like above entry for every database.
|
1031
|
+
You can write a description like the above entry for every database.
|
1067
1032
|
|
1068
1033
|
The database name is a local label for yourself, so you can name it
|
1069
1034
|
freely and it can differ from the name of the actual databases. In the
|
@@ -1073,8 +1038,8 @@ connection to the database is tried sequentially with the order
|
|
1073
1038
|
written in configuration files. However, this has not (yet) been
|
1074
1039
|
implemented in BioRuby.
|
1075
1040
|
|
1076
|
-
In addition, for some
|
1077
|
-
other than locations (e.g. user name
|
1041
|
+
In addition, for some protocols, you must set additional options
|
1042
|
+
other than locations (e.g. user name for MySQL). In the BioRegistory
|
1078
1043
|
specification, current available protocols are:
|
1079
1044
|
|
1080
1045
|
* index-flat
|
@@ -1088,8 +1053,7 @@ In BioRuby, you can use index-flat, index-berkleydb, biofetch and biosql.
|
|
1088
1053
|
Note that the BioRegistry specification sometimes gets updated and BioRuby
|
1089
1054
|
does not always follow quickly.
|
1090
1055
|
|
1091
|
-
Here an example.
|
1092
|
-
files:
|
1056
|
+
Here is an example. It creates a Bio::Registry object and reads the configuration files:
|
1093
1057
|
|
1094
1058
|
reg = Bio::Registry.new
|
1095
1059
|
|
@@ -1100,41 +1064,38 @@ files:
|
|
1100
1064
|
entry = serv.get_by_id('AA2CG')
|
1101
1065
|
|
1102
1066
|
|
1103
|
-
The variable "serv" is a server object corresponding to the
|
1104
|
-
written in configuration files. The class of the object is one of
|
1067
|
+
The variable "serv" is a server object corresponding to the settings
|
1068
|
+
written in the configuration files. The class of the object is one of
|
1105
1069
|
Bio::SQL, Bio::Fetch, and so on. Note that Bio::Registry#get_database("name")
|
1106
1070
|
returns nil if no database is found.
|
1107
1071
|
|
1108
|
-
After that, you can use get_by_id method and some specific methods.
|
1109
|
-
Please refer to below
|
1072
|
+
After that, you can use the get_by_id method and some specific methods.
|
1073
|
+
Please refer to the sections below for more information.
|
1110
1074
|
|
1111
1075
|
== BioFlat
|
1112
1076
|
|
1113
1077
|
BioFlat is a mechanism to create index files of flat files and to retrieve
|
1114
1078
|
these entries fast. There are two index types. index-flat is a simple index
|
1115
|
-
performing binary search without using
|
1079
|
+
performing binary search without using any external libraries of Ruby. index-berkeleydb
|
1116
1080
|
uses Berkeley DB for indexing - but requires installing bdb on your computer,
|
1117
|
-
as well as the BDB Ruby package.
|
1118
|
-
br_bioflat.rb command bundled with BioRuby.
|
1081
|
+
as well as the BDB Ruby package. To create the index itself, you can use br_bioflat.rb command bundled with BioRuby.
|
1119
1082
|
|
1120
1083
|
% br_bioflat.rb --makeindex database_name [--format data_format] filename...
|
1121
1084
|
|
1122
1085
|
The format can be omitted because BioRuby has autodetection. If that
|
1123
|
-
|
1124
|
-
database class.
|
1086
|
+
doesn't work, you can try specifying the data format as the name of a BioRuby database class.
|
1125
1087
|
|
1126
1088
|
Search and retrieve data from database:
|
1127
1089
|
|
1128
1090
|
% br_bioflat.rb database_name identifier
|
1129
1091
|
|
1130
|
-
For example, to create index of GenBank files gbbct*.seq and get entry
|
1131
|
-
from the database:
|
1092
|
+
For example, to create an index of GenBank files gbbct*.seq and get the entry from the database:
|
1132
1093
|
|
1133
1094
|
% br_bioflat.rb --makeindex my_bctdb --format GenBank gbbct*.seq
|
1134
1095
|
% br_bioflat.rb my_bctdb A16STM262
|
1135
1096
|
|
1136
1097
|
If you have Berkeley DB on your system and installed the bdb extension
|
1137
|
-
module of Ruby (see http://raa.ruby-lang.org/project/bdb
|
1098
|
+
module of Ruby (see ((<the BDB project page|URL:http://raa.ruby-lang.org/project/bdb/>)) ), you can
|
1138
1099
|
create and search indexes with Berkeley DB - a very fast alternative
|
1139
1100
|
that uses little computer memory. When creating the index, use the
|
1140
1101
|
"--makeindex-bdb" option instead of "--makeindex".
|
@@ -1145,12 +1106,12 @@ that uses little computer memory. When creating the index, use the
|
|
1145
1106
|
|
1146
1107
|
Note: this section is an advanced topic
|
1147
1108
|
|
1148
|
-
BioFetch is a database retrieval mechanism via CGI.
|
1149
|
-
options and error codes are standardized.
|
1109
|
+
BioFetch is a database retrieval mechanism via CGI. CGI Parameters,
|
1110
|
+
options and error codes are standardized. Client access via
|
1150
1111
|
http is possible giving the database name, identifiers and format to
|
1151
1112
|
retrieve entries.
|
1152
1113
|
|
1153
|
-
The BioRuby project has a BioFetch server
|
1114
|
+
The BioRuby project has a BioFetch server at bioruby.org. It uses
|
1154
1115
|
GenomeNet's DBGET system as a backend. The source code of the
|
1155
1116
|
server is in sample/ directory. Currently, there are only two
|
1156
1117
|
BioFetch servers in the world: bioruby.org and EBI.
|
@@ -1176,8 +1137,8 @@ Here are some methods to retrieve entries from our BioFetch server.
|
|
1176
1137
|
serv = reg.get_database('genbank')
|
1177
1138
|
entry = serv.get_by_id('AA2CG')
|
1178
1139
|
|
1179
|
-
If you want to use (4), you
|
1180
|
-
in seqdatabase.ini.
|
1140
|
+
If you want to use (4), you have to include some settings
|
1141
|
+
in seqdatabase.ini. For example:
|
1181
1142
|
|
1182
1143
|
[genbank]
|
1183
1144
|
protocol=biofetch
|
@@ -1186,11 +1147,11 @@ in seqdatabase.ini. E.g.
|
|
1186
1147
|
|
1187
1148
|
=== The combination of BioFetch, Bio::KEGG::GENES and Bio::AAindex1
|
1188
1149
|
|
1189
|
-
Bioinformatics is often about
|
1190
|
-
example
|
1191
|
-
Halobacterium from KEGG GENES database and
|
1150
|
+
Bioinformatics is often about gluing things together. Here is an
|
1151
|
+
example that gets the bacteriorhodopsin gene (VNG1467G) of the archaea
|
1152
|
+
Halobacterium from KEGG GENES database and gets alpha-helix index
|
1192
1153
|
data (BURA740101) from the AAindex (Amino acid indices and similarity
|
1193
|
-
matrices) database, and
|
1154
|
+
matrices) database, and shows the helix score for each 15-aa length
|
1194
1155
|
overlapping window.
|
1195
1156
|
|
1196
1157
|
#!/usr/bin/env ruby
|
@@ -1212,16 +1173,16 @@ overlapping window.
|
|
1212
1173
|
position += 1
|
1213
1174
|
end
|
1214
1175
|
|
1215
|
-
The special method Bio::Fetch.query uses preset BioFetch server
|
1216
|
-
|
1176
|
+
The special method Bio::Fetch.query uses the preset BioFetch server
|
1177
|
+
at bioruby.org. (The server internally gets data from GenomeNet.
|
1217
1178
|
Because the KEGG/GENES database and AAindex database are not available
|
1218
|
-
from other BioFetch servers, we used bioruby.org server with
|
1179
|
+
from other BioFetch servers, we used the bioruby.org server with
|
1219
1180
|
Bio::Fetch.query method.)
|
1220
1181
|
|
1221
1182
|
== BioSQL
|
1222
1183
|
|
1223
|
-
BioSQL is a well known schema to store and retrive biological sequences using a RDBMS like PostgreSQL or MySQL
|
1224
|
-
First of all, you must install a database engine or have access to a remote one. Then create the schema and populate with the taxonomy. You can follow the ((<Official Guide|URL:http://code.open-bio.org/svnweb/index.cgi/biosql/view/biosql-schema/trunk/INSTALL>)) .
|
1184
|
+
BioSQL is a well known schema to store and retrive biological sequences using a RDBMS like PostgreSQL or MySQL: note that SQLite is not supported.
|
1185
|
+
First of all, you must install a database engine or have access to a remote one. Then create the schema and populate with the taxonomy. You can follow the ((<Official Guide|URL:http://code.open-bio.org/svnweb/index.cgi/biosql/view/biosql-schema/trunk/INSTALL>)) to accomplish these steps.
|
1225
1186
|
Next step is to install these gems:
|
1226
1187
|
* ActiveRecord
|
1227
1188
|
* CompositePrimaryKeys (Rails doesn't handle by default composite primary keys)
|
@@ -1230,22 +1191,23 @@ Next step is to install these gems:
|
|
1230
1191
|
|
1231
1192
|
You can find ActiveRecord's models in /bioruby/lib/bio/io/biosql
|
1232
1193
|
|
1233
|
-
When you have your database up and running, you can connect to it
|
1194
|
+
When you have your database up and running, you can connect to it like this:
|
1234
1195
|
|
1235
1196
|
#!/usr/bin/env ruby
|
1236
1197
|
|
1237
1198
|
require 'bio'
|
1238
1199
|
|
1239
1200
|
connection = Bio::SQL.establish_connection({'development'=>{'hostname'=>"YourHostname",
|
1240
|
-
|
1241
|
-
|
1242
|
-
|
1243
|
-
|
1244
|
-
|
1245
|
-
|
1246
|
-
|
1247
|
-
|
1248
|
-
#The first parameter is the hash contaning the description of the configuration similar to database.yml in Rails
|
1201
|
+
'database'=>"CoolBioSeqDB",
|
1202
|
+
'adapter'=>"jdbcmysql",
|
1203
|
+
'username'=>"YourUser",
|
1204
|
+
'password'=>"YouPassword"
|
1205
|
+
}
|
1206
|
+
},
|
1207
|
+
'development')
|
1208
|
+
|
1209
|
+
#The first parameter is the hash contaning the description of the configuration; similar to database.yml in Rails applications, you can declare different environment.
|
1210
|
+
#The second parameter is the environment to use: 'development', 'test', or 'production'.
|
1249
1211
|
|
1250
1212
|
#To store a sequence into the database you simply need a biosequence object.
|
1251
1213
|
biosql_database = Bio::SQL::Biodatabase.find(:first)
|
@@ -1264,35 +1226,35 @@ When you have your database up and running, you can connect to it in this way:
|
|
1264
1226
|
#retriving a generic accession
|
1265
1227
|
bioseq = Bio::SQL.fetch_accession("YouAccession")
|
1266
1228
|
|
1267
|
-
#If you use biosequence objects, you will find all its method mapped to BioSQL sequences.
|
1229
|
+
#If you use biosequence objects, you will find all its method mapped to BioSQL sequences.
|
1230
|
+
#But you can also access to the models directly:
|
1268
1231
|
|
1269
|
-
#get the raw sequence associated with
|
1232
|
+
#get the raw sequence associated with your accession
|
1270
1233
|
bioseq.entry.biosequence
|
1271
1234
|
|
1272
|
-
#get the length of your sequence
|
1235
|
+
#get the length of your sequence; this is the explicit form of bioseq.length
|
1273
1236
|
bioseq.entry.biosequence.length
|
1274
1237
|
|
1275
|
-
#convert the sequence
|
1238
|
+
#convert the sequence into GenBank format
|
1276
1239
|
bioseq.to_biosequence.output(:genbank)
|
1277
1240
|
|
1278
|
-
BioSQL' ((<schema|URL:http://www.biosql.org/wiki/Schema_Overview>)) is not
|
1241
|
+
BioSQL's ((<schema|URL:http://www.biosql.org/wiki/Schema_Overview>)) is not very intuitive for beginners, so spend some time on understanding it. In the end if you know a little bit of Ruby on Rails, everything will go smoothly. You can find information on Annotation ((<here|URL:http://www.biosql.org/wiki/Annotation_Mapping>)).
|
1279
1242
|
ToDo: add exemaples from George. I remember he did some cool post on BioSQL and Rails.
|
1280
1243
|
|
1281
|
-
|
1282
1244
|
= PhyloXML
|
1283
1245
|
|
1284
1246
|
PhyloXML is an XML language for saving, analyzing and exchanging data of
|
1285
|
-
annotated phylogenetic trees. PhyloXML parser in BioRuby is implemented in
|
1286
|
-
Bio::PhyloXML::Parser and writer in Bio::PhyloXML::Writer.
|
1287
|
-
More information at www.phyloxml.org
|
1247
|
+
annotated phylogenetic trees. PhyloXML's parser in BioRuby is implemented in
|
1248
|
+
Bio::PhyloXML::Parser, and its writer in Bio::PhyloXML::Writer.
|
1249
|
+
More information can be found at ((<www.phyloxml.org|URL:http://www.phyloxml.org>)).
|
1288
1250
|
|
1289
1251
|
== Requirements
|
1290
1252
|
|
1291
|
-
In addition to BioRuby
|
1253
|
+
In addition to BioRuby, you need the libxml Ruby bindings. To install, execute:
|
1292
1254
|
|
1293
1255
|
% gem install -r libxml-ruby
|
1294
1256
|
|
1295
|
-
For more information see ((<URL:http://libxml.rubyforge.org/install.xml>))
|
1257
|
+
For more information see the ((<libxml installer page|URL:http://libxml.rubyforge.org/install.xml>))
|
1296
1258
|
|
1297
1259
|
== Parsing a file
|
1298
1260
|
|
@@ -1306,11 +1268,11 @@ For more information see ((<URL:http://libxml.rubyforge.org/install.xml>))
|
|
1306
1268
|
puts tree.name
|
1307
1269
|
end
|
1308
1270
|
|
1309
|
-
If there are several trees in the file, you can access the one you wish by
|
1271
|
+
If there are several trees in the file, you can access the one you wish by specifying its index:
|
1310
1272
|
|
1311
1273
|
tree = phyloxml[3]
|
1312
1274
|
|
1313
|
-
You can use all Bio::Tree methods on the tree, since PhyloXML::Tree inherits from Bio::Tree. For example,
|
1275
|
+
You can use all Bio::Tree methods on the tree, since PhyloXML::Tree inherits from Bio::Tree. For example,
|
1314
1276
|
|
1315
1277
|
tree.leaves.each do |node|
|
1316
1278
|
puts node.name
|
@@ -1338,7 +1300,7 @@ PhyloXML files can hold additional information besides phylogenies at the end of
|
|
1338
1300
|
|
1339
1301
|
== Retrieving data
|
1340
1302
|
|
1341
|
-
Here is an example of how to retrieve the scientific name of the clades.
|
1303
|
+
Here is an example of how to retrieve the scientific name of the clades included in each tree.
|
1342
1304
|
|
1343
1305
|
require 'bio'
|
1344
1306
|
|
@@ -1385,7 +1347,7 @@ Here is an example of how to retrieve the scientific name of the clades.
|
|
1385
1347
|
|
1386
1348
|
== The BioRuby example programs
|
1387
1349
|
|
1388
|
-
Some sample programs are stored in ./samples/ directory.
|
1350
|
+
Some sample programs are stored in ./samples/ directory. For example, the n2aa.rb program (transforms a nucleic acid sequence into an amino acid sequence) can be run using:
|
1389
1351
|
|
1390
1352
|
./sample/na2aa.rb test/data/fasta/example1.txt
|
1391
1353
|
|
@@ -1404,21 +1366,21 @@ in this tutorial to doctest - more info upcoming.
|
|
1404
1366
|
|
1405
1367
|
See the BioRuby in anger Wiki. A lot of BioRuby's documentation exists in the
|
1406
1368
|
source code and unit tests. To really dive in you will need the latest source
|
1407
|
-
code tree. The embedded rdoc documentation can be viewed online at
|
1369
|
+
code tree. The embedded rdoc documentation for the BioRuby source code can be viewed online at
|
1408
1370
|
((<URL:http://bioruby.org/rdoc/>)).
|
1409
1371
|
|
1410
1372
|
== BioRuby Shell
|
1411
1373
|
|
1412
|
-
The BioRuby shell implementation
|
1374
|
+
The BioRuby shell implementation is located in ./lib/bio/shell. It is very interesting
|
1413
1375
|
as it uses IRB (the Ruby intepreter) which is a powerful environment described in
|
1414
|
-
((<Programming Ruby's
|
1376
|
+
((<Programming Ruby's IRB chapter|URL:http://ruby-doc.org/docs/ProgrammingRuby/html/irb.html>)). IRB commands can be typed directly into the shell, e.g.
|
1415
1377
|
|
1416
1378
|
bioruby!> IRB.conf[:PROMPT_MODE]
|
1417
1379
|
==!> :PROMPT_C
|
1418
1380
|
|
1419
|
-
|
1381
|
+
Additionally, you also may want to install the optional Ruby readline support -
|
1420
1382
|
with Debian libreadline-ruby. To edit a previous line you may have to press
|
1421
|
-
line down (arrow
|
1383
|
+
line down (down arrow) first.
|
1422
1384
|
|
1423
1385
|
= Helpful tools
|
1424
1386
|
|
@@ -1428,7 +1390,7 @@ source code by clicking on class and method names.
|
|
1428
1390
|
cd bioruby/lib
|
1429
1391
|
rtags -R --vi
|
1430
1392
|
|
1431
|
-
For a tutorial see ((<URL:http://rtags.rubyforge.org/>))
|
1393
|
+
For a tutorial see ((<here|URL:http://rtags.rubyforge.org/>))
|
1432
1394
|
|
1433
1395
|
= APPENDIX
|
1434
1396
|
|
@@ -1440,9 +1402,9 @@ Please refer to KEGG_API.rd.ja (English version: ((<URL:http://www.genome.jp/keg
|
|
1440
1402
|
|
1441
1403
|
== Ruby Ensembl API
|
1442
1404
|
|
1443
|
-
Ruby Ensembl API is a
|
1405
|
+
The Ruby Ensembl API is a Ruby API to the Ensembl database. It is NOT currently
|
1444
1406
|
included in the BioRuby archives. To install it, see
|
1445
|
-
((<URL:http://wiki.github.com/jandot/ruby-ensembl-api>))
|
1407
|
+
((<the Ruby-Ensembl Github|URL:http://wiki.github.com/jandot/ruby-ensembl-api>))
|
1446
1408
|
for more information.
|
1447
1409
|
|
1448
1410
|
=== Gene Ontology (GO) through the Ruby Ensembl API
|
@@ -1455,7 +1417,7 @@ Gene Ontologies can be fetched through the Ruby Ensembl API package:
|
|
1455
1417
|
infile.each do |line|
|
1456
1418
|
accs = line.split(",") # Split the comma-sep.entries into an array
|
1457
1419
|
drosphila_acc = accs.shift # the first entry is the Drosophila acc
|
1458
|
-
mosq_acc = accs.shift # the second entry is
|
1420
|
+
mosq_acc = accs.shift # the second entry is your Mosq. acc
|
1459
1421
|
gene = Ensembl::Core::Gene.find_by_stable_id(drosophila_acc)
|
1460
1422
|
print "#{mosq_acc}"
|
1461
1423
|
gene.go_terms.each do |go|
|
@@ -1470,10 +1432,10 @@ homologues.
|
|
1470
1432
|
|
1471
1433
|
At the moment there is no easy way of accessing BioPerl from Ruby. The best way, perhaps, is to create a Perl server that gets accessed through XML/RPC or SOAP.
|
1472
1434
|
|
1473
|
-
== Installing required external
|
1435
|
+
== Installing required external libraries
|
1474
1436
|
|
1475
1437
|
At this point for using BioRuby no additional libraries are needed, except if
|
1476
|
-
you are using Bio::PhyloXML module
|
1438
|
+
you are using the Bio::PhyloXML module; then you have to install libxml-ruby.
|
1477
1439
|
|
1478
1440
|
This may change, so keep an eye on the Bioruby website. Also when
|
1479
1441
|
a package is missing BioRuby should show an informative message.
|
@@ -1485,7 +1447,7 @@ carefully that come with each package.
|
|
1485
1447
|
|
1486
1448
|
=== Installing libxml-ruby
|
1487
1449
|
|
1488
|
-
The simplest way is to use
|
1450
|
+
The simplest way is to use the RubyGems packaging system:
|
1489
1451
|
|
1490
1452
|
gem install -r libxml-ruby
|
1491
1453
|
|
@@ -1493,13 +1455,13 @@ If you get `require': no such file to load - mkmf (LoadError) error then do
|
|
1493
1455
|
|
1494
1456
|
sudo apt-get install ruby-dev
|
1495
1457
|
|
1496
|
-
If you have other problems with installation, then see ((<URL:http://libxml.rubyforge.org/install.xml>))
|
1458
|
+
If you have other problems with installation, then see ((<URL:http://libxml.rubyforge.org/install.xml>)).
|
1497
1459
|
|
1498
1460
|
== Trouble shooting
|
1499
1461
|
|
1500
1462
|
* Error: in `require': no such file to load -- bio (LoadError)
|
1501
1463
|
|
1502
|
-
Ruby
|
1464
|
+
Ruby is failing to find the BioRuby libraries - add it to the RUBYLIB path, or pass
|
1503
1465
|
it to the interpeter. For example:
|
1504
1466
|
|
1505
1467
|
ruby -I$BIORUBYPATH/lib yourprogram.rb
|