bio 1.4.1 → 1.4.2

Sign up to get free protection for your applications and to get access to all the features.
Files changed (61) hide show
  1. data/ChangeLog +954 -0
  2. data/KNOWN_ISSUES.rdoc +40 -5
  3. data/README.rdoc +36 -35
  4. data/RELEASE_NOTES.rdoc +87 -59
  5. data/bioruby.gemspec +24 -2
  6. data/doc/RELEASE_NOTES-1.4.1.rdoc +104 -0
  7. data/doc/Tutorial.rd +162 -200
  8. data/doc/Tutorial.rd.html +149 -146
  9. data/lib/bio.rb +1 -0
  10. data/lib/bio/appl/blast.rb +1 -1
  11. data/lib/bio/appl/blast/ddbj.rb +26 -34
  12. data/lib/bio/appl/blast/genomenet.rb +21 -11
  13. data/lib/bio/db/embl/sptr.rb +193 -21
  14. data/lib/bio/db/fasta.rb +1 -1
  15. data/lib/bio/db/fastq.rb +14 -0
  16. data/lib/bio/db/fastq/format_fastq.rb +2 -2
  17. data/lib/bio/db/genbank/ddbj.rb +1 -2
  18. data/lib/bio/db/genbank/format_genbank.rb +1 -1
  19. data/lib/bio/db/medline.rb +1 -0
  20. data/lib/bio/db/newick.rb +3 -1
  21. data/lib/bio/db/pdb/pdb.rb +9 -9
  22. data/lib/bio/db/pdb/residue.rb +2 -2
  23. data/lib/bio/io/ddbjrest.rb +344 -0
  24. data/lib/bio/io/ncbirest.rb +121 -1
  25. data/lib/bio/location.rb +2 -2
  26. data/lib/bio/reference.rb +3 -4
  27. data/lib/bio/shell/plugin/entry.rb +7 -3
  28. data/lib/bio/shell/plugin/ncbirest.rb +5 -1
  29. data/lib/bio/util/restriction_enzyme.rb +3 -0
  30. data/lib/bio/util/restriction_enzyme/dense_int_array.rb +195 -0
  31. data/lib/bio/util/restriction_enzyme/range/sequence_range.rb +7 -7
  32. data/lib/bio/util/restriction_enzyme/range/sequence_range/calculated_cuts.rb +57 -18
  33. data/lib/bio/util/restriction_enzyme/range/sequence_range/fragment.rb +2 -2
  34. data/lib/bio/util/restriction_enzyme/sorted_num_array.rb +219 -0
  35. data/lib/bio/version.rb +1 -1
  36. data/sample/test_restriction_enzyme_long.rb +4403 -0
  37. data/test/data/fasta/EFTU_BACSU.fasta +8 -0
  38. data/test/data/genbank/CAA35997.gp +48 -0
  39. data/test/data/genbank/SCU49845.gb +167 -0
  40. data/test/data/litdb/1717226.litdb +13 -0
  41. data/test/data/pir/CRAB_ANAPL.pir +6 -0
  42. data/test/functional/bio/appl/blast/test_remote.rb +93 -0
  43. data/test/functional/bio/appl/test_blast.rb +61 -0
  44. data/test/functional/bio/io/test_ddbjrest.rb +47 -0
  45. data/test/functional/bio/test_command.rb +3 -3
  46. data/test/unit/bio/db/embl/test_sptr.rb +6 -6
  47. data/test/unit/bio/db/embl/test_uniprot_new_part.rb +208 -0
  48. data/test/unit/bio/db/genbank/test_common.rb +274 -0
  49. data/test/unit/bio/db/genbank/test_genbank.rb +401 -0
  50. data/test/unit/bio/db/genbank/test_genpept.rb +81 -0
  51. data/test/unit/bio/db/pdb/test_pdb.rb +3287 -11
  52. data/test/unit/bio/db/test_fasta.rb +34 -12
  53. data/test/unit/bio/db/test_fastq.rb +26 -0
  54. data/test/unit/bio/db/test_litdb.rb +95 -0
  55. data/test/unit/bio/db/test_medline.rb +1 -0
  56. data/test/unit/bio/db/test_nbrf.rb +82 -0
  57. data/test/unit/bio/db/test_newick.rb +22 -4
  58. data/test/unit/bio/test_reference.rb +35 -0
  59. data/test/unit/bio/util/restriction_enzyme/test_dense_int_array.rb +201 -0
  60. data/test/unit/bio/util/restriction_enzyme/test_sorted_num_array.rb +281 -0
  61. metadata +44 -38
@@ -0,0 +1,104 @@
1
+ = BioRuby 1.4.1 RELEASE NOTES
2
+
3
+ A lot of changes have been made to the BioRuby 1.4.1 after the version 1.4.0
4
+ is released. This document describes important and/or incompatible changes
5
+ since the BioRuby 1.4.0 release.
6
+
7
+ For known problems, see KNOWN_ISSUES.rdoc.
8
+
9
+ == New features
10
+
11
+ === PAML Codeml support is significantly improved
12
+
13
+ PAML Codeml result parser is completely rewritten and is significantly
14
+ improved. The code is developed by Pjotr Prins.
15
+
16
+ === KEGG PATHWAY and KEGG MODULE parser
17
+
18
+ Parsers for KEGG PATHWAY and KEGG MODULE data are added. The code is developed
19
+ by Kozo Nishida and Toshiaki Katayama.
20
+
21
+ === Bio::KEGG improvements
22
+
23
+ Following new methods are added.
24
+
25
+ * Bio::KEGG::GENES#keggclass, keggclasses, names_as_array, names,
26
+ motifs_as_strings, motifs_as_hash, motifs
27
+ * Bio::KEGG::GENOME#original_databases
28
+
29
+ === Test codes are added and improved.
30
+
31
+ Test codes are added and improved. Tney are developed by Kazuhiro Hayashi,
32
+ Kozo Nishida, John Prince, and Naohisa Goto.
33
+
34
+ === Other new methods
35
+
36
+ * Bio::Fastq#mask
37
+ * Bio::Sequence#output_fasta
38
+ * Bio::ClustalW::Report#get_sequence
39
+ * Bio::Reference#==
40
+ * Bio::Location#==
41
+ * Bio::Locations#==
42
+ * Bio::FastaNumericFormat#to_biosequence
43
+
44
+ == Bug fixes
45
+
46
+ === Bio::Tree
47
+
48
+ Following methods did not work correctly.
49
+
50
+ * Bio::Tree#collect_edge!
51
+ * Bio::Tree#remove_edge_if
52
+
53
+ === Bio::KEGG::GENES and Bio::KEGG::GENOME
54
+
55
+ * Fixed bugs in Bio::KEGG::GENES#pathway.
56
+ * Fixed parser errors due to the format changes of KEGG GENES and KEGG GENOME.
57
+
58
+ === Other bug fixes
59
+
60
+ * In Bio::Command, changed not to call fork(2) on platforms that do not
61
+ support it.
62
+ * Bio::MEDLINE#initialize should handle continuation of lines.
63
+ * Typo and a missing field in Bio::GO::GeneAssociation#to_str.
64
+ * Bug fix of Bio::FastaNumericFormat#to_biosequence.
65
+ * Fixed UniProt GN parsing issue in Bio::SPTR.
66
+
67
+ == Incompatible changes
68
+
69
+ === Bio::PAML::Codeml::Report
70
+
71
+ The code is completely rewritten. See the RDoc for details.
72
+
73
+ === Bio::KEGG::ORTHOLOGY
74
+
75
+ Bio::KEGG::ORTHOLOGY#pathways is changed to return a hash. The old pathway
76
+ method is renamed to pathways_in_keggclass for compatibility.
77
+
78
+ === Bio::AAindex2
79
+
80
+ Bio::AAindex2 now copies each symmetric element for lower triangular matrix
81
+ to the upper right part, because the Matrix class in Ruby 1.9.2 no longer
82
+ accepts any dimension mismatches. We think the previous behavior is a bug.
83
+
84
+ === Bio::MEDLINE
85
+
86
+ Bio::MEDLINE#reference no longer puts empty values in the returned
87
+ Bio::Reference object. We think the previous behavior is a bug.
88
+ We also think the effect is very small.
89
+
90
+ == Known issues
91
+
92
+ The following issues are added or updated. See KNOWN_ISSUES.rdoc for other
93
+ already known issues.
94
+
95
+ === String escaping of command-line arguments in Ruby 1.9.X on Windows
96
+
97
+ After BioRuby 1.4.1, in Ruby 1.9.X running on Windows, escaping of
98
+ command-line arguments are processed by the Ruby interpreter. Before BioRuby
99
+ 1.4.0, the escaping is executed in Bio::Command#escape_shell_windows, and
100
+ the behavior is different from the Ruby interpreter's one.
101
+
102
+ Curreltly, due to the change, test/functional/bio/test_command.rb may fail
103
+ on Windows with Ruby 1.9.X.
104
+
@@ -30,7 +30,7 @@
30
30
  #
31
31
  #
32
32
 
33
- bioruby> $: << '../lib'
33
+ bioruby> $: << '../lib' # make sure rubydoctest finds bioruby/lib
34
34
 
35
35
  =begin
36
36
  #doctest Testing bioruby
@@ -38,19 +38,19 @@ bioruby> $: << '../lib'
38
38
  = BioRuby Tutorial
39
39
 
40
40
  * Copyright (C) 2001-2003 KATAYAMA Toshiaki <k .at. bioruby.org>
41
- * Copyright (C) 2005-2010 Pjotr Prins, Naohisa Goto and others
41
+ * Copyright (C) 2005-2011 Pjotr Prins, Naohisa Goto and others
42
42
 
43
- This document was last modified: 2010/01/08
44
- Current editor: Pjotr Prins <p .at. bioruby.org>
43
+ This document was last modified: 2011/03/24
44
+ Current editor: Michael O'Keefe <okeefm (at) rpi (dot) edu>
45
45
 
46
- The latest version resides in the GIT source code repository: ./doc/((<Tutorial.rd|URL:http://github.com/pjotrp/bioruby/raw/documentation/doc/Tutorial.rd>)).
46
+ The latest version resides in the GIT source code repository: ./doc/((<Tutorial.rd|URL:https://github.com/bioruby/bioruby/blob/master/doc/Tutorial.rd>)).
47
47
 
48
48
  == Introduction
49
49
 
50
50
  This is a tutorial for using Bioruby. A basic knowledge of Ruby is required.
51
- If you want to know more about the programming langauge Ruby we recommend the
51
+ If you want to know more about the programming language, we recommend the
52
52
  latest Ruby book ((<Programming Ruby|URL:http://www.pragprog.com/titles/ruby>))
53
- by Dave Thomas and Andy Hunt - the first edition is online
53
+ by Dave Thomas and Andy Hunt - the first edition can be read online
54
54
  ((<here|URL:http://www.ruby-doc.org/docs/ProgrammingRuby/>)).
55
55
 
56
56
  For BioRuby you need to install Ruby and the BioRuby package on your computer
@@ -60,7 +60,7 @@ version it has with the
60
60
 
61
61
  % ruby -v
62
62
 
63
- command. Showing something like:
63
+ command. You should see something like:
64
64
 
65
65
  ruby 1.8.7 (2008-08-11 patchlevel 72) [i486-linux]
66
66
 
@@ -70,7 +70,7 @@ manager. For more information see the
70
70
 
71
71
  With Ruby download and install Bioruby using the links on the
72
72
  ((<Bioruby|URL:http://bioruby.org/>)) website. The recommended installation is via
73
- Ruby gems:
73
+ RubyGems:
74
74
 
75
75
  gem install bio
76
76
 
@@ -83,10 +83,13 @@ documentation can be viewed online at
83
83
 
84
84
  == Trying Bioruby
85
85
 
86
- Bioruby comes with its own shell. After unpacking the sources run the
87
- following command
86
+ Bioruby comes with its own shell. After unpacking the sources run one of the following commands:
88
87
 
89
- ./bin/bioruby or
88
+ bioruby
89
+
90
+ or, from the source tree
91
+
92
+ cd bioruby
90
93
  ruby -I lib bin/bioruby
91
94
 
92
95
  and you should see a prompt
@@ -110,11 +113,11 @@ question to the mailing list. BioRuby developers usually try to help.
110
113
 
111
114
  The Bio::Sequence class allows the usual sequence transformations and
112
115
  translations. In the example below the DNA sequence "atgcatgcaaaa" is
113
- converted into the complemental strand, spliced into a subsequence,
114
- next the nucleic acid composition is calculated and the sequence is
116
+ converted into the complemental strand and spliced into a subsequence;
117
+ next, the nucleic acid composition is calculated and the sequence is
115
118
  translated into the amino acid sequence, the molecular weight
116
- calculated, and so on. When translating into amino acid sequences the
117
- frame can be specified and optionally the condon table selected (as
119
+ calculated, and so on. When translating into amino acid sequences, the
120
+ frame can be specified and optionally the codon table selected (as
118
121
  defined in codontable.rb).
119
122
 
120
123
  bioruby> seq = Bio::Sequence::NA.new("atgcatgcaaaa")
@@ -124,7 +127,7 @@ defined in codontable.rb).
124
127
  bioruby> seq.complement
125
128
  ==> "ttttgcatgcat"
126
129
 
127
- bioruby> seq.subseq(3,8) # gets subsequence of positions 3 to 8
130
+ bioruby> seq.subseq(3,8) # gets subsequence of positions 3 to 8 (starting from 1)
128
131
  ==> "gcatgc"
129
132
  bioruby> seq.gc_percent
130
133
  ==> 33
@@ -169,11 +172,11 @@ Windows). For example
169
172
  % ri p
170
173
  % ri File.open
171
174
 
172
- Nucleic acid sequence is an object of Bio::Sequence::NA class, and
173
- amino acid sequence is an object of Bio::Sequence::AA class. Shared
175
+ Nucleic acid sequence are members of the Bio::Sequence::NA class, and
176
+ amino acid sequence are members of the Bio::Sequence::AA class. Shared
174
177
  methods are in the parent Bio::Sequence class.
175
178
 
176
- As Bio::Sequence class inherits Ruby's String class, you can use
179
+ As Bio::Sequence inherits Ruby's String class, you can use
177
180
  String class methods. For example, to get a subsequence, you can
178
181
  not only use subseq(from, to) but also String#[].
179
182
 
@@ -189,15 +192,14 @@ has index 0, for example:
189
192
 
190
193
  So when using String methods, you should subtract 1 from positions
191
194
  conventionally used in biology. (subseq method will throw an exception if you
192
- specify positions smaller than or equal to 0 for either one of the "from" or
193
- "to".)
195
+ specify positions smaller than or equal to 0 for either one of the "from" or "to".)
194
196
 
195
197
  The window_search(window_size, step_size) method shows a typical Ruby
196
198
  way of writing concise and clear code using 'closures'. Each sliding
197
199
  window creates a subsequence which is supplied to the enclosed block
198
200
  through a variable named +s+.
199
201
 
200
- * Show average percentage of GC content for 20 bases (stepping the default one base at a time)
202
+ * Show average percentage of GC content for 20 bases (stepping the default one base at a time):
201
203
 
202
204
  bioruby> seq = Bio::Sequence::NA.new("atgcatgcaattaagctaatcccaattagatcatcccgatcatcaaaaaaaaaa")
203
205
  ==> "atgcatgcaattaagctaatcccaattagatcatcccgatcatcaaaaaaaaaa"
@@ -268,8 +270,8 @@ For example:
268
270
 
269
271
  puts my_aaseq
270
272
 
271
- Save the program as na2aa.rb. Prepare a nucleic acid sequence
272
- described below and saves it as my_naseq.txt:
273
+ Save the program above as na2aa.rb. Prepare a nucleic acid sequence
274
+ described below and save it as my_naseq.txt:
273
275
 
274
276
  gtggcgatctttccgaaagcgatgactggagcgaagaaccaaagcagtgacatttgtctg
275
277
  atgccgcacgtaggcctgataagacgcggacagcgtcgcatcaggcatcttgtgcaaatg
@@ -288,7 +290,7 @@ Outputs
288
290
 
289
291
  VAIFPKAMTGAKNQSSDICLMPHVGLIRRGQRRIRHLVQMSDAA*
290
292
 
291
- You can also write this, a bit fanciful, as a one-liner script.
293
+ You can also write this, a bit fancifully, as a one-liner script.
292
294
 
293
295
  % ruby -r bio -e 'p Bio::Sequence::NA.new($<.read).translate' my_naseq.txt
294
296
 
@@ -301,7 +303,7 @@ sequence files. One generic example of the above can be found in
301
303
  We assume that you already have some GenBank data files. (If you don't,
302
304
  download some .seq files from ftp://ftp.ncbi.nih.gov/genbank/)
303
305
 
304
- As an example we fetch the ID, definition and sequence of each entry
306
+ As an example we will fetch the ID, definition and sequence of each entry
305
307
  from the GenBank format and convert it to FASTA. This is also an example
306
308
  script in the BioRuby distribution.
307
309
 
@@ -349,7 +351,7 @@ For example, in turn, reading FASTA format files:
349
351
  puts "naseq : " + f.naseq
350
352
  end
351
353
 
352
- In above two scripts, the first arguments of Bio::FlatFile.new are
354
+ In the above two scripts, the first arguments of Bio::FlatFile.new are
353
355
  database classes of BioRuby. This is expanded on in a later section.
354
356
 
355
357
  Again another option is to use the Bio::DB.open class:
@@ -408,12 +410,9 @@ very complicated:
408
410
 
409
411
  * Note: In this example Feature#assoc method makes a Hash from a
410
412
  feature object. It is useful because you can get data from the hash
411
- by using qualifiers as keys.
412
- (But there is a risk some information is lost when two or more
413
- qualifiers are the same. Therefore an Array is returned by
414
- Feature#feature)
413
+ by using qualifiers as keys. But there is a risk some information is lost when two or more qualifiers are the same. Therefore an Array is returned by Feature#feature.
415
414
 
416
- Bio::Sequence#splicing splices subsequence from nucleic acid sequence
415
+ Bio::Sequence#splicing splices subsequences from nucleic acid sequences
417
416
  according to location information used in GenBank, EMBL and DDBJ.
418
417
 
419
418
  When the specified translation table is different from the default
@@ -434,7 +433,7 @@ bio/location.rb.
434
433
  locs = Bio::Locations.new('join((8298.8300)..10206,1..855)')
435
434
  naseq.splicing(locs)
436
435
 
437
- You can also use the splicing method for amino acid sequences
436
+ You can also use this splicing method for amino acid sequences
438
437
  (Bio::Sequence::AA objects).
439
438
 
440
439
  * Splicing peptide from a protein (e.g. signal peptide)
@@ -450,11 +449,7 @@ the ./lib/bio/db directory of the BioRuby source tree.
450
449
 
451
450
  In many cases the Bio::DatabaseClass acts as a factory pattern
452
451
  and recognises the database type automatically - returning a
453
- parsed object. For example using Bio::FlatFile
454
-
455
- Bio::FlatFile class as described above. The first argument of the
456
- Bio::FlatFile.new is database class name in BioRuby (such as Bio::GenBank,
457
- Bio::KEGG::GENES and so on).
452
+ parsed object. For example using Bio::FlatFile class as described above. The first argument of the Bio::FlatFile.new is database class name in BioRuby (such as Bio::GenBank, Bio::KEGG::GENES and so on).
458
453
 
459
454
  ff = Bio::FlatFile.new(Bio::DatabaseClass, ARGF)
460
455
 
@@ -472,19 +467,18 @@ database class?
472
467
  p entry.seq # sequence data of the entry
473
468
  end
474
469
 
475
- An example that can take any input, filter using a regular expression to output
470
+ An example that can take any input, filter using a regular expression and output
476
471
  to a FASTA file can be found in sample/any2fasta.rb. With this technique it is
477
472
  possible to write a Unix type grep/sort pipe for sequence information. One
478
473
  example using scripts in the BIORUBY sample folder:
479
474
 
480
475
  fastagrep.rb '/At|Dm/' database.seq | fastasort.rb
481
476
 
482
- greps the database for Arabidopsis and Drosophila entries and sorts the output
483
- to FASTA.
477
+ greps the database for Arabidopsis and Drosophila entries and sorts the output to FASTA.
484
478
 
485
479
  Other methods to extract specific data from database objects can be
486
480
  different between databases, though some methods are common (see the
487
- guidelines for common methods as described in bio/db.rb).
481
+ guidelines for common methods in bio/db.rb).
488
482
 
489
483
  * entry_id --> gets ID of the entry
490
484
  * definition --> gets definition of the entry
@@ -495,16 +489,15 @@ guidelines for common methods as described in bio/db.rb).
495
489
  Refer to the documents of each database to find the exact naming
496
490
  of the included methods.
497
491
 
498
- In principal BioRuby uses the following conventions: when a method
499
- name is plural the method returns some object as an Array. For
492
+ In general, BioRuby uses the following conventions: when a method
493
+ name is plural, the method returns some object as an Array. For
500
494
  example, some classes have a "references" method which returns
501
495
  multiple Bio::Reference objects as an Array. And some classes have a
502
496
  "reference" method which returns a single Bio::Reference object.
503
497
 
504
498
  === Alignments (Bio::Alignment)
505
499
 
506
- Bio::Alignment class in bio/alignment.rb is a container class like Ruby's Hash,
507
- Array and BioPerl's Bio::SimpleAlign. A very simple example is:
500
+ The Bio::Alignment class in bio/alignment.rb is a container class like Ruby's Hash and Array classes and BioPerl's Bio::SimpleAlign. A very simple example is:
508
501
 
509
502
  bioruby> seqs = [ 'atgca', 'aagca', 'acgca', 'acgcg' ]
510
503
  bioruby> seqs = seqs.collect{ |x| Bio::Sequence::NA.new(x) }
@@ -536,18 +529,45 @@ Array and BioPerl's Bio::SimpleAlign. A very simple example is:
536
529
  factory = Bio::ClustalW.new
537
530
  a2 = a.do_align(factory)
538
531
 
532
+ Read a ClustalW or Muscle 'ALN' alignment file:
533
+
534
+ bioruby> aln = Bio::ClustalW::Report.new(File.read('../test/data/clustalw/example1.aln'))
535
+ bioruby> aln.header
536
+ ==> "CLUSTAL 2.0.9 multiple sequence alignment"
537
+
538
+ Fetch a sequence:
539
+
540
+ bioruby> seq = aln.get_sequence(1)
541
+ bioruby> seq.definition
542
+ ==> "gi|115023|sp|P10425|"
543
+
544
+ Get a partial sequence:
545
+
546
+ bioruby> seq.to_s[60..120]
547
+ ==> "LGYFNG-EAVPSNGLVLNTSKGLVLVDSSWDNKLTKELIEMVEKKFQKRVTDVIITHAHAD"
548
+
549
+ Show the full alignment residue match information for the sequences in the set:
550
+
551
+ bioruby> aln.match_line[60..120]
552
+ ==> " . **. . .. ::*: . * : : . .: .* * *"
553
+
554
+ Return a Bio::Alignment object:
555
+
556
+ bioruby> aln.alignment.consensus[60..120]
557
+ ==> "???????????SN?????????????D??????????L??????????????????H?H?D"
558
+
539
559
  == Restriction Enzymes (Bio::RE)
540
560
 
541
561
  BioRuby has extensive support for restriction enzymes (REs). It contains a full
542
562
  library of commonly used REs (from REBASE) which can be used to cut single
543
- stranded RNA or dubbel stranded DNA into fragments. To list all enzymes:
563
+ stranded RNA or double stranded DNA into fragments. To list all enzymes:
544
564
 
545
565
  rebase = Bio::RestrictionEnzyme.rebase
546
566
  rebase.each do |enzyme_name, info|
547
567
  p enzyme_name
548
568
  end
549
569
 
550
- and cut a sequence with an enzyme follow up with:
570
+ and to cut a sequence with an enzyme follow up with:
551
571
 
552
572
  res = seq.cut_with_enzyme('EcoRII', {:max_permutations => 0},
553
573
  {:view_ranges => true})
@@ -577,13 +597,14 @@ and cut a sequence with an enzyme follow up with:
577
597
  Let's start with a query.pep file which contains a sequence in FASTA
578
598
  format. In this example we are going to execute a homology search
579
599
  from a remote internet site or on your local machine. Note that you
580
- can use the ssearch program instead of fasta when you use them in your
600
+ can use the ssearch program instead of fasta when you use it in your
581
601
  local machine.
582
602
 
583
603
  === using FASTA in local machine
584
604
 
585
605
  Install the fasta program on your machine (the command name looks like
586
606
  fasta34. FASTA can be downloaded from ftp://ftp.virginia.edu/pub/fasta/).
607
+
587
608
  First, you must prepare your FASTA-formatted database sequence file
588
609
  target.pep and FASTA-formatted query.pep.
589
610
 
@@ -619,7 +640,7 @@ target.pep and FASTA-formatted query.pep.
619
640
  end
620
641
  end
621
642
 
622
- We named above script as f_search.rb. You can execute as follows:
643
+ We named above script f_search.rb. You can execute it as follows:
623
644
 
624
645
  % ./f_search.rb query.pep target.pep > f_search.out
625
646
 
@@ -630,14 +651,13 @@ Bio::Sequence#fasta method can be used.
630
651
  seq = ">test seq\nYQVLEEIGRGSFGSVRKVIHIPTKKLLVRKDIKYGHMNSKE"
631
652
  seq.fasta(factory)
632
653
 
633
- When you want to add options to FASTA command, you can set the
634
- third argument of Bio::Fasta.local method. For example, setting ktup to 1
635
- and getting top-10 hits:
654
+ When you want to add options to FASTA commands, you can set the
655
+ third argument of the Bio::Fasta.local method. For example, the following sets ktup to 1 and gets a list of the top 10 hits:
636
656
 
637
657
  factory = Bio::Fasta.local('fasta34', 'target.pep', '-b 10')
638
658
  factory.ktup = 1
639
659
 
640
- Bio::Fasta#query returns Bio::Fasta::Report object.
660
+ Bio::Fasta#query returns a Bio::Fasta::Report object.
641
661
  We can get almost all information described in FASTA report text
642
662
  with the Report object. For example, getting information for hits:
643
663
 
@@ -665,12 +685,11 @@ with the Report object. For example, getting information for hits:
665
685
  puts hit.lap_at # array of above four numbers
666
686
  end
667
687
 
668
- Most of above methods are common with the Bio::Blast::Report described
669
- below. Please refer to document of Bio::Fasta::Report class for
688
+ Most of above methods are common to the Bio::Blast::Report described
689
+ below. Please refer to the documentation of the Bio::Fasta::Report class for
670
690
  FASTA-specific details.
671
691
 
672
- If you need original output text of FASTA program you can use the "output"
673
- method of the factory object after the "query" method.
692
+ If you need the original output text of FASTA program you can use the "output" method of the factory object after the "query" method.
674
693
 
675
694
  report = factory.query(entry)
676
695
  puts factory.output
@@ -698,15 +717,15 @@ Available databases in GenomeNet:
698
717
  Select the databases you require. Next, give the search program from
699
718
  the type of query sequence and database.
700
719
 
701
- * When query is a amino acid sequence
720
+ * When query is an amino acid sequence
702
721
  * When protein database, program is "fasta".
703
722
  * When nucleic database, program is "tfasta".
704
723
 
705
724
  * When query is a nucleic acid sequence
706
725
  * When nucleic database, program is "fasta".
707
- * (When protein database, you would fail to search.)
726
+ * (When protein database, the search would fail.)
708
727
 
709
- For example:
728
+ For example, run:
710
729
 
711
730
  program = 'fasta'
712
731
  database = 'genes'
@@ -741,7 +760,7 @@ The parameter "program" is different from FASTA - as you can expect:
741
760
  Bio::BLAST uses "-m 7" XML output of BLAST by default when either
742
761
  XMLParser or REXML (both of them are XML parser libraries for Ruby -
743
762
  of the two XMLParser is the fastest) is installed on your computer. In
744
- Ruby version 1.8.0, or later, REXML is bundled with Ruby's
763
+ Ruby version 1.8.0 or later, REXML is bundled with Ruby's
745
764
  distribution.
746
765
 
747
766
  When no XML parser library is present, Bio::BLAST uses "-m 8" tabular
@@ -776,10 +795,10 @@ midline.
776
795
  end
777
796
 
778
797
  For simplicity and API compatibility, some information such as score
779
- are extracted from the first Hsp (High-scoring Segment Pair).
798
+ is extracted from the first Hsp (High-scoring Segment Pair).
780
799
 
781
800
  Check the documentation for Bio::Blast::Report to see what can be
782
- retrieved. For now suffice to state that Bio::Blast::Report has a
801
+ retrieved. For now suffice to say that Bio::Blast::Report has a
783
802
  hierarchical structure mirroring the general BLAST output stream:
784
803
 
785
804
  * In a Bio::Blast::Report object, @iterations is an array of
@@ -854,65 +873,12 @@ Bio::Blast::Report.new(or Bio::Blast::Default::Report.new):
854
873
 
855
874
  factory = Bio::Blast.remote(program, db, option, 'MYSITE')
856
875
 
857
- When you write above routines, please send to the BioRuby project and
858
- they may be included.
876
+ When you write above routines, please send them to the BioRuby project, and they may be included in future releases.
859
877
 
860
878
  == Generate a reference list using PubMed (Bio::PubMed)
861
- =end
862
- (EDITORs NOTE: examples in this section do not work and should be rewritten.)
863
-
864
- Below script is an example which seaches PubMed and creates a reference list.
865
-
866
- ARGV.each do |id|
867
- entry = Bio::PubMed.query(id) # searches PubMed and get entry
868
- medline = Bio::MEDLINE.new(entry) # creates Bio::MEDLINE object from entry text
869
- reference = medline.reference # converts into Bio::Reference object
870
- puts reference.bibtex # shows BibTeX formatted text
871
- end
872
-
873
- We named the script pmfetch.rb.
874
-
875
- % ./pmfetch.rb 11024183 10592278 10592173
876
-
877
- To give some PubMed ID (PMID) in arguments, the script retrieves informations
878
- from NCBI, parses MEDLINE format text, converts into BibTeX format and
879
- shows them.
880
-
881
- A keyword search is also available.
882
-
883
- #!/usr/bin/env ruby
884
-
885
- require 'bio'
886
-
887
- # Concatinates argument keyword list to a string
888
- keywords = ARGV.join(' ')
889
-
890
- # PubMed keyword search
891
- entries = Bio::PubMed.search(keywords)
892
-
893
- entries.each do |entry|
894
- medline = Bio::MEDLINE.new(entry) # creates Bio::MEDLINE object from text
895
- reference = medline.reference # converts into Bio::Reference object
896
- puts reference.bibtex # shows BibTeX format text
897
- end
898
-
899
- We named the script pmsearch.rb.
900
-
901
- % ./pmsearch.rb genome bioinformatics
902
-
903
- To give keywords in arguments, the script searches PubMed by given
904
- keywords and shows bibliography informations in a BibTex format. Other
905
- output formats are also avaialble like the bibitem method described
906
- below. Some journal formats like nature and nar can be used, but lack
907
- bold and italic font output.
908
-
909
- (EDITORs NOTE: do we have some simple object that can be queried for
910
- author, title etc.?)
911
- =begin
912
879
 
913
880
  Nowadays using NCBI E-Utils is recommended. Use Bio::PubMed.esearch
914
- and Bio::PubMed.efetch instead of above methods.
915
-
881
+ and Bio::PubMed.efetch.
916
882
 
917
883
  #!/usr/bin/env ruby
918
884
 
@@ -959,7 +925,7 @@ BibTeX format bibliography data to a file named genoinfo.bib.
959
925
 
960
926
  The BibTeX can be used with Tex or LaTeX to form bibliography
961
927
  information with your journal article. For more information
962
- on BibTex see (EDITORS NOTE: insert URL). A quick example:
928
+ on using BibTex see ((<BibTex HowTo site|URL:http://www.bibtex.org/Using/>)). A quick example:
963
929
 
964
930
  Save this to hoge.tex:
965
931
 
@@ -977,14 +943,13 @@ Then,
977
943
  % latex hoge # creates bibliography list
978
944
  % latex hoge # inserts correct bibliography reference
979
945
 
980
- Now, you get hoge.dvi and hoge.ps - the latter you can view any
981
- Postscript viewer.
946
+ Now, you get hoge.dvi and hoge.ps - the latter of which can be viewed with any Postscript viewer.
982
947
 
983
948
  === Bio::Reference#bibitem
984
949
 
985
950
  When you don't want to create a bib file, you can use
986
951
  Bio::Reference#bibitem method instead of Bio::Reference#bibtex.
987
- In above pmfetch.rb and pmsearch.rb scripts, change
952
+ In the above pmfetch.rb and pmsearch.rb scripts, change
988
953
 
989
954
  puts reference.bibtex
990
955
  to
@@ -1031,11 +996,11 @@ BioRuby and other projects' members (2002).
1031
996
  * Server-client model for getting entry from database via http.
1032
997
 
1033
998
  * BioSQL
1034
- * Schemas to store sequence data to relational database such as
999
+ * Schemas to store sequence data to relational databases such as
1035
1000
  MySQL and PostgreSQL, and methods to retrieve entries from the database.
1036
1001
 
1037
- Here we give a quick overview. Check out
1038
- ((<URL:http://obda.open-bio.org/>)) for more extensive details.
1002
+ This tutorial only gives a quick overview of OBDA. Check out
1003
+ ((<the OBDA site|URL:http://obda.open-bio.org>)) for more extensive details.
1039
1004
 
1040
1005
  == BioRegistry
1041
1006
 
@@ -1053,17 +1018,17 @@ when all local configulation files are not available.
1053
1018
  In the current BioRuby implementation all local configulation files
1054
1019
  are read. For databases with the same name settings encountered first
1055
1020
  are used. This means that if you don't like some settings of a
1056
- database in system global configuration file
1057
- (/etc/bioinformatics/seqdatabase.ini), you can easily override it by
1021
+ database in the system's global configuration file
1022
+ (/etc/bioinformatics/seqdatabase.ini), you can easily override them by
1058
1023
  writing settings to ~/.bioinformatics/seqdatabase.ini.
1059
1024
 
1060
1025
  The syntax of the configuration file is called a stanza format. For example
1061
1026
 
1062
1027
  [DatabaseName]
1063
1028
  protocol=ProtocolName
1064
- location=ServeName
1029
+ location=ServerName
1065
1030
 
1066
- You can write a description like above entry for every database.
1031
+ You can write a description like the above entry for every database.
1067
1032
 
1068
1033
  The database name is a local label for yourself, so you can name it
1069
1034
  freely and it can differ from the name of the actual databases. In the
@@ -1073,8 +1038,8 @@ connection to the database is tried sequentially with the order
1073
1038
  written in configuration files. However, this has not (yet) been
1074
1039
  implemented in BioRuby.
1075
1040
 
1076
- In addition, for some protocol, you must set additional options
1077
- other than locations (e.g. user name of MySQL). In the BioRegistory
1041
+ In addition, for some protocols, you must set additional options
1042
+ other than locations (e.g. user name for MySQL). In the BioRegistory
1078
1043
  specification, current available protocols are:
1079
1044
 
1080
1045
  * index-flat
@@ -1088,8 +1053,7 @@ In BioRuby, you can use index-flat, index-berkleydb, biofetch and biosql.
1088
1053
  Note that the BioRegistry specification sometimes gets updated and BioRuby
1089
1054
  does not always follow quickly.
1090
1055
 
1091
- Here an example. Create a Bio::Registry object. It reads the configuration
1092
- files:
1056
+ Here is an example. It creates a Bio::Registry object and reads the configuration files:
1093
1057
 
1094
1058
  reg = Bio::Registry.new
1095
1059
 
@@ -1100,41 +1064,38 @@ files:
1100
1064
  entry = serv.get_by_id('AA2CG')
1101
1065
 
1102
1066
 
1103
- The variable "serv" is a server object corresponding to the setting
1104
- written in configuration files. The class of the object is one of
1067
+ The variable "serv" is a server object corresponding to the settings
1068
+ written in the configuration files. The class of the object is one of
1105
1069
  Bio::SQL, Bio::Fetch, and so on. Note that Bio::Registry#get_database("name")
1106
1070
  returns nil if no database is found.
1107
1071
 
1108
- After that, you can use get_by_id method and some specific methods.
1109
- Please refer to below documents.
1072
+ After that, you can use the get_by_id method and some specific methods.
1073
+ Please refer to the sections below for more information.
1110
1074
 
1111
1075
  == BioFlat
1112
1076
 
1113
1077
  BioFlat is a mechanism to create index files of flat files and to retrieve
1114
1078
  these entries fast. There are two index types. index-flat is a simple index
1115
- performing binary search without using an external library of Ruby. index-berkeleydb
1079
+ performing binary search without using any external libraries of Ruby. index-berkeleydb
1116
1080
  uses Berkeley DB for indexing - but requires installing bdb on your computer,
1117
- as well as the BDB Ruby package. For creating the index itself, you can use
1118
- br_bioflat.rb command bundled with BioRuby.
1081
+ as well as the BDB Ruby package. To create the index itself, you can use br_bioflat.rb command bundled with BioRuby.
1119
1082
 
1120
1083
  % br_bioflat.rb --makeindex database_name [--format data_format] filename...
1121
1084
 
1122
1085
  The format can be omitted because BioRuby has autodetection. If that
1123
- does not work you can try specifying data format as a name of BioRuby
1124
- database class.
1086
+ doesn't work, you can try specifying the data format as the name of a BioRuby database class.
1125
1087
 
1126
1088
  Search and retrieve data from database:
1127
1089
 
1128
1090
  % br_bioflat.rb database_name identifier
1129
1091
 
1130
- For example, to create index of GenBank files gbbct*.seq and get entry
1131
- from the database:
1092
+ For example, to create an index of GenBank files gbbct*.seq and get the entry from the database:
1132
1093
 
1133
1094
  % br_bioflat.rb --makeindex my_bctdb --format GenBank gbbct*.seq
1134
1095
  % br_bioflat.rb my_bctdb A16STM262
1135
1096
 
1136
1097
  If you have Berkeley DB on your system and installed the bdb extension
1137
- module of Ruby (see http://raa.ruby-lang.org/project/bdb/), you can
1098
+ module of Ruby (see ((<the BDB project page|URL:http://raa.ruby-lang.org/project/bdb/>)) ), you can
1138
1099
  create and search indexes with Berkeley DB - a very fast alternative
1139
1100
  that uses little computer memory. When creating the index, use the
1140
1101
  "--makeindex-bdb" option instead of "--makeindex".
@@ -1145,12 +1106,12 @@ that uses little computer memory. When creating the index, use the
1145
1106
 
1146
1107
  Note: this section is an advanced topic
1147
1108
 
1148
- BioFetch is a database retrieval mechanism via CGI. CGI Parameters,
1149
- options and error codes are standardized. There client access via
1109
+ BioFetch is a database retrieval mechanism via CGI. CGI Parameters,
1110
+ options and error codes are standardized. Client access via
1150
1111
  http is possible giving the database name, identifiers and format to
1151
1112
  retrieve entries.
1152
1113
 
1153
- The BioRuby project has a BioFetch server in bioruby.org. It uses
1114
+ The BioRuby project has a BioFetch server at bioruby.org. It uses
1154
1115
  GenomeNet's DBGET system as a backend. The source code of the
1155
1116
  server is in sample/ directory. Currently, there are only two
1156
1117
  BioFetch servers in the world: bioruby.org and EBI.
@@ -1176,8 +1137,8 @@ Here are some methods to retrieve entries from our BioFetch server.
1176
1137
  serv = reg.get_database('genbank')
1177
1138
  entry = serv.get_by_id('AA2CG')
1178
1139
 
1179
- If you want to use (4), you, obviously, have to include some settings
1180
- in seqdatabase.ini. E.g.
1140
+ If you want to use (4), you have to include some settings
1141
+ in seqdatabase.ini. For example:
1181
1142
 
1182
1143
  [genbank]
1183
1144
  protocol=biofetch
@@ -1186,11 +1147,11 @@ in seqdatabase.ini. E.g.
1186
1147
 
1187
1148
  === The combination of BioFetch, Bio::KEGG::GENES and Bio::AAindex1
1188
1149
 
1189
- Bioinformatics is often about glueing things together. Here we give an
1190
- example to get the bacteriorhodopsin gene (VNG1467G) of the archaea
1191
- Halobacterium from KEGG GENES database and to get alpha-helix index
1150
+ Bioinformatics is often about gluing things together. Here is an
1151
+ example that gets the bacteriorhodopsin gene (VNG1467G) of the archaea
1152
+ Halobacterium from KEGG GENES database and gets alpha-helix index
1192
1153
  data (BURA740101) from the AAindex (Amino acid indices and similarity
1193
- matrices) database, and show the helix score for each 15-aa length
1154
+ matrices) database, and shows the helix score for each 15-aa length
1194
1155
  overlapping window.
1195
1156
 
1196
1157
  #!/usr/bin/env ruby
@@ -1212,16 +1173,16 @@ overlapping window.
1212
1173
  position += 1
1213
1174
  end
1214
1175
 
1215
- The special method Bio::Fetch.query uses preset BioFetch server
1216
- in bioruby.org. (The server internally get data from GenomeNet.
1176
+ The special method Bio::Fetch.query uses the preset BioFetch server
1177
+ at bioruby.org. (The server internally gets data from GenomeNet.
1217
1178
  Because the KEGG/GENES database and AAindex database are not available
1218
- from other BioFetch servers, we used bioruby.org server with
1179
+ from other BioFetch servers, we used the bioruby.org server with
1219
1180
  Bio::Fetch.query method.)
1220
1181
 
1221
1182
  == BioSQL
1222
1183
 
1223
- BioSQL is a well known schema to store and retrive biological sequences using a RDBMS like PostgreSQL or MySQL; note that SQLite is not supported.
1224
- First of all, you must install a database engine or have access to a remote one. Then create the schema and populate with the taxonomy. You can follow the ((<Official Guide|URL:http://code.open-bio.org/svnweb/index.cgi/biosql/view/biosql-schema/trunk/INSTALL>)) .
1184
+ BioSQL is a well known schema to store and retrive biological sequences using a RDBMS like PostgreSQL or MySQL: note that SQLite is not supported.
1185
+ First of all, you must install a database engine or have access to a remote one. Then create the schema and populate with the taxonomy. You can follow the ((<Official Guide|URL:http://code.open-bio.org/svnweb/index.cgi/biosql/view/biosql-schema/trunk/INSTALL>)) to accomplish these steps.
1225
1186
  Next step is to install these gems:
1226
1187
  * ActiveRecord
1227
1188
  * CompositePrimaryKeys (Rails doesn't handle by default composite primary keys)
@@ -1230,22 +1191,23 @@ Next step is to install these gems:
1230
1191
 
1231
1192
  You can find ActiveRecord's models in /bioruby/lib/bio/io/biosql
1232
1193
 
1233
- When you have your database up and running, you can connect to it in this way:
1194
+ When you have your database up and running, you can connect to it like this:
1234
1195
 
1235
1196
  #!/usr/bin/env ruby
1236
1197
 
1237
1198
  require 'bio'
1238
1199
 
1239
1200
  connection = Bio::SQL.establish_connection({'development'=>{'hostname'=>"YourHostname",
1240
- 'database'=>"CoolBioSeqDB",
1241
- 'adapter'=>"jdbcmysql",
1242
- 'username'=>"YourUser",
1243
- 'password'=>"YouPassword"
1244
- }
1245
- },
1246
- 'development')
1247
-
1248
- #The first parameter is the hash contaning the description of the configuration similar to database.yml in Rails application, you can declare different environment. The second parameter is the environment to use: 'development', 'test', 'production'.
1201
+ 'database'=>"CoolBioSeqDB",
1202
+ 'adapter'=>"jdbcmysql",
1203
+ 'username'=>"YourUser",
1204
+ 'password'=>"YouPassword"
1205
+ }
1206
+ },
1207
+ 'development')
1208
+
1209
+ #The first parameter is the hash contaning the description of the configuration; similar to database.yml in Rails applications, you can declare different environment.
1210
+ #The second parameter is the environment to use: 'development', 'test', or 'production'.
1249
1211
 
1250
1212
  #To store a sequence into the database you simply need a biosequence object.
1251
1213
  biosql_database = Bio::SQL::Biodatabase.find(:first)
@@ -1264,35 +1226,35 @@ When you have your database up and running, you can connect to it in this way:
1264
1226
  #retriving a generic accession
1265
1227
  bioseq = Bio::SQL.fetch_accession("YouAccession")
1266
1228
 
1267
- #If you use biosequence objects, you will find all its method mapped to BioSQL sequences. But you can also access to the models directly:
1229
+ #If you use biosequence objects, you will find all its method mapped to BioSQL sequences.
1230
+ #But you can also access to the models directly:
1268
1231
 
1269
- #get the raw sequence associated with you accession
1232
+ #get the raw sequence associated with your accession
1270
1233
  bioseq.entry.biosequence
1271
1234
 
1272
- #get the length of your sequence, this is the explicit form of bioseq.length
1235
+ #get the length of your sequence; this is the explicit form of bioseq.length
1273
1236
  bioseq.entry.biosequence.length
1274
1237
 
1275
- #convert the sequence in GenBank format
1238
+ #convert the sequence into GenBank format
1276
1239
  bioseq.to_biosequence.output(:genbank)
1277
1240
 
1278
- BioSQL' ((<schema|URL:http://www.biosql.org/wiki/Schema_Overview>)) is not so intuitive at the beginning, spend some time on understanding it, in the end if you know a little bit of rails everything will go smootly. You can find information to Annotation ((<here|URL:http://www.biosql.org/wiki/Annotation_Mapping>))
1241
+ BioSQL's ((<schema|URL:http://www.biosql.org/wiki/Schema_Overview>)) is not very intuitive for beginners, so spend some time on understanding it. In the end if you know a little bit of Ruby on Rails, everything will go smoothly. You can find information on Annotation ((<here|URL:http://www.biosql.org/wiki/Annotation_Mapping>)).
1279
1242
  ToDo: add exemaples from George. I remember he did some cool post on BioSQL and Rails.
1280
1243
 
1281
-
1282
1244
  = PhyloXML
1283
1245
 
1284
1246
  PhyloXML is an XML language for saving, analyzing and exchanging data of
1285
- annotated phylogenetic trees. PhyloXML parser in BioRuby is implemented in
1286
- Bio::PhyloXML::Parser and writer in Bio::PhyloXML::Writer.
1287
- More information at www.phyloxml.org
1247
+ annotated phylogenetic trees. PhyloXML's parser in BioRuby is implemented in
1248
+ Bio::PhyloXML::Parser, and its writer in Bio::PhyloXML::Writer.
1249
+ More information can be found at ((<www.phyloxml.org|URL:http://www.phyloxml.org>)).
1288
1250
 
1289
1251
  == Requirements
1290
1252
 
1291
- In addition to BioRuby library you need a libxml ruby bindings. To install:
1253
+ In addition to BioRuby, you need the libxml Ruby bindings. To install, execute:
1292
1254
 
1293
1255
  % gem install -r libxml-ruby
1294
1256
 
1295
- For more information see ((<URL:http://libxml.rubyforge.org/install.xml>))
1257
+ For more information see the ((<libxml installer page|URL:http://libxml.rubyforge.org/install.xml>))
1296
1258
 
1297
1259
  == Parsing a file
1298
1260
 
@@ -1306,11 +1268,11 @@ For more information see ((<URL:http://libxml.rubyforge.org/install.xml>))
1306
1268
  puts tree.name
1307
1269
  end
1308
1270
 
1309
- If there are several trees in the file, you can access the one you wish by an index
1271
+ If there are several trees in the file, you can access the one you wish by specifying its index:
1310
1272
 
1311
1273
  tree = phyloxml[3]
1312
1274
 
1313
- You can use all Bio::Tree methods on the tree, since PhyloXML::Tree inherits from Bio::Tree. For example,
1275
+ You can use all Bio::Tree methods on the tree, since PhyloXML::Tree inherits from Bio::Tree. For example,
1314
1276
 
1315
1277
  tree.leaves.each do |node|
1316
1278
  puts node.name
@@ -1338,7 +1300,7 @@ PhyloXML files can hold additional information besides phylogenies at the end of
1338
1300
 
1339
1301
  == Retrieving data
1340
1302
 
1341
- Here is an example of how to retrieve the scientific name of the clades.
1303
+ Here is an example of how to retrieve the scientific name of the clades included in each tree.
1342
1304
 
1343
1305
  require 'bio'
1344
1306
 
@@ -1385,7 +1347,7 @@ Here is an example of how to retrieve the scientific name of the clades.
1385
1347
 
1386
1348
  == The BioRuby example programs
1387
1349
 
1388
- Some sample programs are stored in ./samples/ directory. Run for example:
1350
+ Some sample programs are stored in ./samples/ directory. For example, the n2aa.rb program (transforms a nucleic acid sequence into an amino acid sequence) can be run using:
1389
1351
 
1390
1352
  ./sample/na2aa.rb test/data/fasta/example1.txt
1391
1353
 
@@ -1404,21 +1366,21 @@ in this tutorial to doctest - more info upcoming.
1404
1366
 
1405
1367
  See the BioRuby in anger Wiki. A lot of BioRuby's documentation exists in the
1406
1368
  source code and unit tests. To really dive in you will need the latest source
1407
- code tree. The embedded rdoc documentation can be viewed online at
1369
+ code tree. The embedded rdoc documentation for the BioRuby source code can be viewed online at
1408
1370
  ((<URL:http://bioruby.org/rdoc/>)).
1409
1371
 
1410
1372
  == BioRuby Shell
1411
1373
 
1412
- The BioRuby shell implementation you find in ./lib/bio/shell. It is very interesting
1374
+ The BioRuby shell implementation is located in ./lib/bio/shell. It is very interesting
1413
1375
  as it uses IRB (the Ruby intepreter) which is a powerful environment described in
1414
- ((<Programming Ruby's irb chapter|URL:http://ruby-doc.org/docs/ProgrammingRuby/html/irb.html>)). IRB commands can directly be typed in the shell, e.g.
1376
+ ((<Programming Ruby's IRB chapter|URL:http://ruby-doc.org/docs/ProgrammingRuby/html/irb.html>)). IRB commands can be typed directly into the shell, e.g.
1415
1377
 
1416
1378
  bioruby!> IRB.conf[:PROMPT_MODE]
1417
1379
  ==!> :PROMPT_C
1418
1380
 
1419
- optionally you also may want to install the optional Ruby readline support -
1381
+ Additionally, you also may want to install the optional Ruby readline support -
1420
1382
  with Debian libreadline-ruby. To edit a previous line you may have to press
1421
- line down (arrow down) first.
1383
+ line down (down arrow) first.
1422
1384
 
1423
1385
  = Helpful tools
1424
1386
 
@@ -1428,7 +1390,7 @@ source code by clicking on class and method names.
1428
1390
  cd bioruby/lib
1429
1391
  rtags -R --vi
1430
1392
 
1431
- For a tutorial see ((<URL:http://rtags.rubyforge.org/>))
1393
+ For a tutorial see ((<here|URL:http://rtags.rubyforge.org/>))
1432
1394
 
1433
1395
  = APPENDIX
1434
1396
 
@@ -1440,9 +1402,9 @@ Please refer to KEGG_API.rd.ja (English version: ((<URL:http://www.genome.jp/keg
1440
1402
 
1441
1403
  == Ruby Ensembl API
1442
1404
 
1443
- Ruby Ensembl API is a ruby API to the Ensembl database. It is NOT currently
1405
+ The Ruby Ensembl API is a Ruby API to the Ensembl database. It is NOT currently
1444
1406
  included in the BioRuby archives. To install it, see
1445
- ((<URL:http://wiki.github.com/jandot/ruby-ensembl-api>))
1407
+ ((<the Ruby-Ensembl Github|URL:http://wiki.github.com/jandot/ruby-ensembl-api>))
1446
1408
  for more information.
1447
1409
 
1448
1410
  === Gene Ontology (GO) through the Ruby Ensembl API
@@ -1455,7 +1417,7 @@ Gene Ontologies can be fetched through the Ruby Ensembl API package:
1455
1417
  infile.each do |line|
1456
1418
  accs = line.split(",") # Split the comma-sep.entries into an array
1457
1419
  drosphila_acc = accs.shift # the first entry is the Drosophila acc
1458
- mosq_acc = accs.shift # the second entry is you Mosq. acc
1420
+ mosq_acc = accs.shift # the second entry is your Mosq. acc
1459
1421
  gene = Ensembl::Core::Gene.find_by_stable_id(drosophila_acc)
1460
1422
  print "#{mosq_acc}"
1461
1423
  gene.go_terms.each do |go|
@@ -1470,10 +1432,10 @@ homologues.
1470
1432
 
1471
1433
  At the moment there is no easy way of accessing BioPerl from Ruby. The best way, perhaps, is to create a Perl server that gets accessed through XML/RPC or SOAP.
1472
1434
 
1473
- == Installing required external library
1435
+ == Installing required external libraries
1474
1436
 
1475
1437
  At this point for using BioRuby no additional libraries are needed, except if
1476
- you are using Bio::PhyloXML module. Then you have to install libxml-ruby.
1438
+ you are using the Bio::PhyloXML module; then you have to install libxml-ruby.
1477
1439
 
1478
1440
  This may change, so keep an eye on the Bioruby website. Also when
1479
1441
  a package is missing BioRuby should show an informative message.
@@ -1485,7 +1447,7 @@ carefully that come with each package.
1485
1447
 
1486
1448
  === Installing libxml-ruby
1487
1449
 
1488
- The simplest way is to use gem packaging system.
1450
+ The simplest way is to use the RubyGems packaging system:
1489
1451
 
1490
1452
  gem install -r libxml-ruby
1491
1453
 
@@ -1493,13 +1455,13 @@ If you get `require': no such file to load - mkmf (LoadError) error then do
1493
1455
 
1494
1456
  sudo apt-get install ruby-dev
1495
1457
 
1496
- If you have other problems with installation, then see ((<URL:http://libxml.rubyforge.org/install.xml>))
1458
+ If you have other problems with installation, then see ((<URL:http://libxml.rubyforge.org/install.xml>)).
1497
1459
 
1498
1460
  == Trouble shooting
1499
1461
 
1500
1462
  * Error: in `require': no such file to load -- bio (LoadError)
1501
1463
 
1502
- Ruby fails to find the BioRuby libraries - add it to the RUBYLIB path, or pass
1464
+ Ruby is failing to find the BioRuby libraries - add it to the RUBYLIB path, or pass
1503
1465
  it to the interpeter. For example:
1504
1466
 
1505
1467
  ruby -I$BIORUBYPATH/lib yourprogram.rb