bio 1.4.1 → 1.4.2

Sign up to get free protection for your applications and to get access to all the features.
Files changed (61) hide show
  1. data/ChangeLog +954 -0
  2. data/KNOWN_ISSUES.rdoc +40 -5
  3. data/README.rdoc +36 -35
  4. data/RELEASE_NOTES.rdoc +87 -59
  5. data/bioruby.gemspec +24 -2
  6. data/doc/RELEASE_NOTES-1.4.1.rdoc +104 -0
  7. data/doc/Tutorial.rd +162 -200
  8. data/doc/Tutorial.rd.html +149 -146
  9. data/lib/bio.rb +1 -0
  10. data/lib/bio/appl/blast.rb +1 -1
  11. data/lib/bio/appl/blast/ddbj.rb +26 -34
  12. data/lib/bio/appl/blast/genomenet.rb +21 -11
  13. data/lib/bio/db/embl/sptr.rb +193 -21
  14. data/lib/bio/db/fasta.rb +1 -1
  15. data/lib/bio/db/fastq.rb +14 -0
  16. data/lib/bio/db/fastq/format_fastq.rb +2 -2
  17. data/lib/bio/db/genbank/ddbj.rb +1 -2
  18. data/lib/bio/db/genbank/format_genbank.rb +1 -1
  19. data/lib/bio/db/medline.rb +1 -0
  20. data/lib/bio/db/newick.rb +3 -1
  21. data/lib/bio/db/pdb/pdb.rb +9 -9
  22. data/lib/bio/db/pdb/residue.rb +2 -2
  23. data/lib/bio/io/ddbjrest.rb +344 -0
  24. data/lib/bio/io/ncbirest.rb +121 -1
  25. data/lib/bio/location.rb +2 -2
  26. data/lib/bio/reference.rb +3 -4
  27. data/lib/bio/shell/plugin/entry.rb +7 -3
  28. data/lib/bio/shell/plugin/ncbirest.rb +5 -1
  29. data/lib/bio/util/restriction_enzyme.rb +3 -0
  30. data/lib/bio/util/restriction_enzyme/dense_int_array.rb +195 -0
  31. data/lib/bio/util/restriction_enzyme/range/sequence_range.rb +7 -7
  32. data/lib/bio/util/restriction_enzyme/range/sequence_range/calculated_cuts.rb +57 -18
  33. data/lib/bio/util/restriction_enzyme/range/sequence_range/fragment.rb +2 -2
  34. data/lib/bio/util/restriction_enzyme/sorted_num_array.rb +219 -0
  35. data/lib/bio/version.rb +1 -1
  36. data/sample/test_restriction_enzyme_long.rb +4403 -0
  37. data/test/data/fasta/EFTU_BACSU.fasta +8 -0
  38. data/test/data/genbank/CAA35997.gp +48 -0
  39. data/test/data/genbank/SCU49845.gb +167 -0
  40. data/test/data/litdb/1717226.litdb +13 -0
  41. data/test/data/pir/CRAB_ANAPL.pir +6 -0
  42. data/test/functional/bio/appl/blast/test_remote.rb +93 -0
  43. data/test/functional/bio/appl/test_blast.rb +61 -0
  44. data/test/functional/bio/io/test_ddbjrest.rb +47 -0
  45. data/test/functional/bio/test_command.rb +3 -3
  46. data/test/unit/bio/db/embl/test_sptr.rb +6 -6
  47. data/test/unit/bio/db/embl/test_uniprot_new_part.rb +208 -0
  48. data/test/unit/bio/db/genbank/test_common.rb +274 -0
  49. data/test/unit/bio/db/genbank/test_genbank.rb +401 -0
  50. data/test/unit/bio/db/genbank/test_genpept.rb +81 -0
  51. data/test/unit/bio/db/pdb/test_pdb.rb +3287 -11
  52. data/test/unit/bio/db/test_fasta.rb +34 -12
  53. data/test/unit/bio/db/test_fastq.rb +26 -0
  54. data/test/unit/bio/db/test_litdb.rb +95 -0
  55. data/test/unit/bio/db/test_medline.rb +1 -0
  56. data/test/unit/bio/db/test_nbrf.rb +82 -0
  57. data/test/unit/bio/db/test_newick.rb +22 -4
  58. data/test/unit/bio/test_reference.rb +35 -0
  59. data/test/unit/bio/util/restriction_enzyme/test_dense_int_array.rb +201 -0
  60. data/test/unit/bio/util/restriction_enzyme/test_sorted_num_array.rb +281 -0
  61. metadata +44 -38
@@ -11,29 +11,29 @@
11
11
  <h1><a name="label-0" id="label-0">BioRuby Tutorial</a></h1><!-- RDLabel: "BioRuby Tutorial" -->
12
12
  <ul>
13
13
  <li>Copyright (C) 2001-2003 KATAYAMA Toshiaki &lt;k .at. bioruby.org&gt;</li>
14
- <li>Copyright (C) 2005-2010 Pjotr Prins, Naohisa Goto and others</li>
14
+ <li>Copyright (C) 2005-2011 Pjotr Prins, Naohisa Goto and others</li>
15
15
  </ul>
16
- <p>This document was last modified: 2010/01/08
17
- Current editor: Pjotr Prins &lt;p .at. bioruby.org&gt;</p>
18
- <p>The latest version resides in the GIT source code repository: ./doc/<a href="http://github.com/pjotrp/bioruby/raw/documentation/doc/Tutorial.rd">Tutorial.rd</a>.</p>
16
+ <p>This document was last modified: 2011/03/24
17
+ Current editor: Michael O'Keefe &lt;okeefm (at) rpi (dot) edu&gt;</p>
18
+ <p>The latest version resides in the GIT source code repository: ./doc/<a href="https://github.com/bioruby/bioruby/blob/master/doc/Tutorial.rd">Tutorial.rd</a>.</p>
19
19
  <h2><a name="label-1" id="label-1">Introduction</a></h2><!-- RDLabel: "Introduction" -->
20
20
  <p>This is a tutorial for using Bioruby. A basic knowledge of Ruby is required.
21
- If you want to know more about the programming langauge Ruby we recommend the
21
+ If you want to know more about the programming language, we recommend the
22
22
  latest Ruby book <a href="http://www.pragprog.com/titles/ruby">Programming Ruby</a>
23
- by Dave Thomas and Andy Hunt - the first edition is online
23
+ by Dave Thomas and Andy Hunt - the first edition can be read online
24
24
  <a href="http://www.ruby-doc.org/docs/ProgrammingRuby/">here</a>.</p>
25
25
  <p>For BioRuby you need to install Ruby and the BioRuby package on your computer</p>
26
26
  <p>You can check whether Ruby is installed on your computer and what
27
27
  version it has with the</p>
28
28
  <pre>% ruby -v</pre>
29
- <p>command. Showing something like:</p>
29
+ <p>command. You should see something like:</p>
30
30
  <pre>ruby 1.8.7 (2008-08-11 patchlevel 72) [i486-linux]</pre>
31
31
  <p>If you see no such thing you'll have to install Ruby using your installation
32
32
  manager. For more information see the
33
33
  <a href="http://www.ruby-lang.org/en/">Ruby</a> website.</p>
34
34
  <p>With Ruby download and install Bioruby using the links on the
35
35
  <a href="http://bioruby.org/">Bioruby</a> website. The recommended installation is via
36
- Ruby gems:</p>
36
+ RubyGems:</p>
37
37
  <pre>gem install bio</pre>
38
38
  <p>See also the Bioruby <a href="http://bioruby.open-bio.org/wiki/Installation">wiki</a>.</p>
39
39
  <p>A lot of BioRuby's documentation exists in the source code and unit tests. To
@@ -41,9 +41,10 @@ really dive in you will need the latest source code tree. The embedded rdoc
41
41
  documentation can be viewed online at
42
42
  <a href="http://bioruby.org/rdoc/">bioruby's rdoc</a>. But first lets start!</p>
43
43
  <h2><a name="label-2" id="label-2">Trying Bioruby</a></h2><!-- RDLabel: "Trying Bioruby" -->
44
- <p>Bioruby comes with its own shell. After unpacking the sources run the
45
- following command</p>
46
- <pre>./bin/bioruby or
44
+ <p>Bioruby comes with its own shell. After unpacking the sources run one of the following commands:</p>
45
+ <pre>bioruby</pre>
46
+ <p>or, from the source tree</p>
47
+ <pre>cd bioruby
47
48
  ruby -I lib bin/bioruby</pre>
48
49
  <p>and you should see a prompt</p>
49
50
  <pre>bioruby&gt;</pre>
@@ -60,11 +61,11 @@ question to the mailing list. BioRuby developers usually try to help.</p>
60
61
  <h2><a name="label-3" id="label-3">Working with nucleic / amino acid sequences (Bio::Sequence class)</a></h2><!-- RDLabel: "Working with nucleic / amino acid sequences (Bio::Sequence class)" -->
61
62
  <p>The Bio::Sequence class allows the usual sequence transformations and
62
63
  translations. In the example below the DNA sequence "atgcatgcaaaa" is
63
- converted into the complemental strand, spliced into a subsequence,
64
- next the nucleic acid composition is calculated and the sequence is
64
+ converted into the complemental strand and spliced into a subsequence;
65
+ next, the nucleic acid composition is calculated and the sequence is
65
66
  translated into the amino acid sequence, the molecular weight
66
- calculated, and so on. When translating into amino acid sequences the
67
- frame can be specified and optionally the condon table selected (as
67
+ calculated, and so on. When translating into amino acid sequences, the
68
+ frame can be specified and optionally the codon table selected (as
68
69
  defined in codontable.rb).</p>
69
70
  <pre>bioruby&gt; seq = Bio::Sequence::NA.new("atgcatgcaaaa")
70
71
  ==&gt; "atgcatgcaaaa"
@@ -73,7 +74,7 @@ defined in codontable.rb).</p>
73
74
  bioruby&gt; seq.complement
74
75
  ==&gt; "ttttgcatgcat"
75
76
 
76
- bioruby&gt; seq.subseq(3,8) # gets subsequence of positions 3 to 8
77
+ bioruby&gt; seq.subseq(3,8) # gets subsequence of positions 3 to 8 (starting from 1)
77
78
  ==&gt; "gcatgc"
78
79
  bioruby&gt; seq.gc_percent
79
80
  ==&gt; 33
@@ -112,10 +113,10 @@ Windows). For example</p>
112
113
  <pre>% ri puts
113
114
  % ri p
114
115
  % ri File.open</pre>
115
- <p>Nucleic acid sequence is an object of Bio::Sequence::NA class, and
116
- amino acid sequence is an object of Bio::Sequence::AA class. Shared
116
+ <p>Nucleic acid sequence are members of the Bio::Sequence::NA class, and
117
+ amino acid sequence are members of the Bio::Sequence::AA class. Shared
117
118
  methods are in the parent Bio::Sequence class.</p>
118
- <p>As Bio::Sequence class inherits Ruby's String class, you can use
119
+ <p>As Bio::Sequence inherits Ruby's String class, you can use
119
120
  String class methods. For example, to get a subsequence, you can
120
121
  not only use subseq(from, to) but also String#[].</p>
121
122
  <p>Please take note that the Ruby's string's are base 0 - i.e. the first letter
@@ -128,14 +129,13 @@ bioruby&gt; s[0..1]
128
129
  ==&gt; "ab"</pre>
129
130
  <p>So when using String methods, you should subtract 1 from positions
130
131
  conventionally used in biology. (subseq method will throw an exception if you
131
- specify positions smaller than or equal to 0 for either one of the "from" or
132
- "to".)</p>
132
+ specify positions smaller than or equal to 0 for either one of the "from" or "to".)</p>
133
133
  <p>The window_search(window_size, step_size) method shows a typical Ruby
134
134
  way of writing concise and clear code using 'closures'. Each sliding
135
135
  window creates a subsequence which is supplied to the enclosed block
136
136
  through a variable named +s+.</p>
137
137
  <ul>
138
- <li><p>Show average percentage of GC content for 20 bases (stepping the default one base at a time)</p>
138
+ <li><p>Show average percentage of GC content for 20 bases (stepping the default one base at a time):</p>
139
139
  <pre>bioruby&gt; seq = Bio::Sequence::NA.new("atgcatgcaattaagctaatcccaattagatcatcccgatcatcaaaaaaaaaa")
140
140
  ==&gt; "atgcatgcaattaagctaatcccaattagatcatcccgatcatcaaaaaaaaaa"
141
141
 
@@ -195,8 +195,8 @@ my_naseq = Bio::Sequence::NA.new(input_seq)
195
195
  my_aaseq = my_naseq.translate
196
196
 
197
197
  puts my_aaseq</pre>
198
- <p>Save the program as na2aa.rb. Prepare a nucleic acid sequence
199
- described below and saves it as my_naseq.txt:</p>
198
+ <p>Save the program above as na2aa.rb. Prepare a nucleic acid sequence
199
+ described below and save it as my_naseq.txt:</p>
200
200
  <pre>gtggcgatctttccgaaagcgatgactggagcgaagaaccaaagcagtgacatttgtctg
201
201
  atgccgcacgtaggcctgataagacgcggacagcgtcgcatcaggcatcttgtgcaaatg
202
202
  tcggatgcggcgtga</pre>
@@ -207,7 +207,7 @@ For example, translates my_naseq.txt:</p>
207
207
  <pre>% cat my_naseq.txt|ruby na2aa.rb</pre>
208
208
  <p>Outputs</p>
209
209
  <pre>VAIFPKAMTGAKNQSSDICLMPHVGLIRRGQRRIRHLVQMSDAA*</pre>
210
- <p>You can also write this, a bit fanciful, as a one-liner script.</p>
210
+ <p>You can also write this, a bit fancifully, as a one-liner script.</p>
211
211
  <pre>% ruby -r bio -e 'p Bio::Sequence::NA.new($&lt;.read).translate' my_naseq.txt</pre>
212
212
  <p>In the next section we will retrieve data from databases instead of using raw
213
213
  sequence files. One generic example of the above can be found in
@@ -215,7 +215,7 @@ sequence files. One generic example of the above can be found in
215
215
  <h2><a name="label-4" id="label-4">Parsing GenBank data (Bio::GenBank class)</a></h2><!-- RDLabel: "Parsing GenBank data (Bio::GenBank class)" -->
216
216
  <p>We assume that you already have some GenBank data files. (If you don't,
217
217
  download some .seq files from ftp://ftp.ncbi.nih.gov/genbank/)</p>
218
- <p>As an example we fetch the ID, definition and sequence of each entry
218
+ <p>As an example we will fetch the ID, definition and sequence of each entry
219
219
  from the GenBank format and convert it to FASTA. This is also an example
220
220
  script in the BioRuby distribution.</p>
221
221
  <p>A first attempt could be to use the Bio::GenBank class for reading in
@@ -256,7 +256,7 @@ ff.each_entry do |f|
256
256
  puts "nalen : " + f.nalen.to_s
257
257
  puts "naseq : " + f.naseq
258
258
  end</pre>
259
- <p>In above two scripts, the first arguments of Bio::FlatFile.new are
259
+ <p>In the above two scripts, the first arguments of Bio::FlatFile.new are
260
260
  database classes of BioRuby. This is expanded on in a later section.</p>
261
261
  <p>Again another option is to use the Bio::DB.open class:</p>
262
262
  <pre>#!/usr/bin/env ruby
@@ -311,12 +311,9 @@ end</pre>
311
311
  <ul>
312
312
  <li>Note: In this example Feature#assoc method makes a Hash from a
313
313
  feature object. It is useful because you can get data from the hash
314
- by using qualifiers as keys.
315
- (But there is a risk some information is lost when two or more
316
- qualifiers are the same. Therefore an Array is returned by
317
- Feature#feature)</li>
314
+ by using qualifiers as keys. But there is a risk some information is lost when two or more qualifiers are the same. Therefore an Array is returned by Feature#feature.</li>
318
315
  </ul>
319
- <p>Bio::Sequence#splicing splices subsequence from nucleic acid sequence
316
+ <p>Bio::Sequence#splicing splices subsequences from nucleic acid sequences
320
317
  according to location information used in GenBank, EMBL and DDBJ.</p>
321
318
  <p>When the specified translation table is different from the default
322
319
  (universal), or when the first codon is not "atg" or the protein
@@ -332,7 +329,7 @@ bio/location.rb.</p>
332
329
  <pre>locs = Bio::Locations.new('join((8298.8300)..10206,1..855)')
333
330
  naseq.splicing(locs)</pre></li>
334
331
  </ul>
335
- <p>You can also use the splicing method for amino acid sequences
332
+ <p>You can also use this splicing method for amino acid sequences
336
333
  (Bio::Sequence::AA objects).</p>
337
334
  <ul>
338
335
  <li><p>Splicing peptide from a protein (e.g. signal peptide)</p>
@@ -344,10 +341,7 @@ with classes like Bio::GenBank, Bio::KEGG::GENES. A full list can be found in
344
341
  the ./lib/bio/db directory of the BioRuby source tree.</p>
345
342
  <p>In many cases the Bio::DatabaseClass acts as a factory pattern
346
343
  and recognises the database type automatically - returning a
347
- parsed object. For example using Bio::FlatFile</p>
348
- <p>Bio::FlatFile class as described above. The first argument of the
349
- Bio::FlatFile.new is database class name in BioRuby (such as Bio::GenBank,
350
- Bio::KEGG::GENES and so on).</p>
344
+ parsed object. For example using Bio::FlatFile class as described above. The first argument of the Bio::FlatFile.new is database class name in BioRuby (such as Bio::GenBank, Bio::KEGG::GENES and so on).</p>
351
345
  <pre>ff = Bio::FlatFile.new(Bio::DatabaseClass, ARGF)</pre>
352
346
  <p>Isn't it wonderful that Bio::FlatFile automagically recognizes each
353
347
  database class?</p>
@@ -361,16 +355,15 @@ ff.each_entry do |entry|
361
355
  p entry.definition # definition of the entry
362
356
  p entry.seq # sequence data of the entry
363
357
  end</pre>
364
- <p>An example that can take any input, filter using a regular expression to output
358
+ <p>An example that can take any input, filter using a regular expression and output
365
359
  to a FASTA file can be found in sample/any2fasta.rb. With this technique it is
366
360
  possible to write a Unix type grep/sort pipe for sequence information. One
367
361
  example using scripts in the BIORUBY sample folder:</p>
368
362
  <pre>fastagrep.rb '/At|Dm/' database.seq | fastasort.rb</pre>
369
- <p>greps the database for Arabidopsis and Drosophila entries and sorts the output
370
- to FASTA.</p>
363
+ <p>greps the database for Arabidopsis and Drosophila entries and sorts the output to FASTA.</p>
371
364
  <p>Other methods to extract specific data from database objects can be
372
365
  different between databases, though some methods are common (see the
373
- guidelines for common methods as described in bio/db.rb).</p>
366
+ guidelines for common methods in bio/db.rb).</p>
374
367
  <ul>
375
368
  <li>entry_id --&gt; gets ID of the entry</li>
376
369
  <li>definition --&gt; gets definition of the entry</li>
@@ -380,14 +373,13 @@ guidelines for common methods as described in bio/db.rb).</p>
380
373
  </ul>
381
374
  <p>Refer to the documents of each database to find the exact naming
382
375
  of the included methods.</p>
383
- <p>In principal BioRuby uses the following conventions: when a method
384
- name is plural the method returns some object as an Array. For
376
+ <p>In general, BioRuby uses the following conventions: when a method
377
+ name is plural, the method returns some object as an Array. For
385
378
  example, some classes have a "references" method which returns
386
379
  multiple Bio::Reference objects as an Array. And some classes have a
387
380
  "reference" method which returns a single Bio::Reference object.</p>
388
381
  <h3><a name="label-6" id="label-6">Alignments (Bio::Alignment)</a></h3><!-- RDLabel: "Alignments (Bio::Alignment)" -->
389
- <p>Bio::Alignment class in bio/alignment.rb is a container class like Ruby's Hash,
390
- Array and BioPerl's Bio::SimpleAlign. A very simple example is:</p>
382
+ <p>The Bio::Alignment class in bio/alignment.rb is a container class like Ruby's Hash and Array classes and BioPerl's Bio::SimpleAlign. A very simple example is:</p>
391
383
  <pre>bioruby&gt; seqs = [ 'atgca', 'aagca', 'acgca', 'acgcg' ]
392
384
  bioruby&gt; seqs = seqs.collect{ |x| Bio::Sequence::NA.new(x) }
393
385
  # creates alignment object
@@ -417,15 +409,32 @@ a.each_site { |x| p x }
417
409
  # clustalw command must be installed.
418
410
  factory = Bio::ClustalW.new
419
411
  a2 = a.do_align(factory)</pre>
412
+ <p>Read a ClustalW or Muscle 'ALN' alignment file:</p>
413
+ <pre>bioruby&gt; aln = Bio::ClustalW::Report.new(File.read('../test/data/clustalw/example1.aln'))
414
+ bioruby&gt; aln.header
415
+ ==&gt; "CLUSTAL 2.0.9 multiple sequence alignment"</pre>
416
+ <p>Fetch a sequence:</p>
417
+ <pre>bioruby&gt; seq = aln.get_sequence(1)
418
+ bioruby&gt; seq.definition
419
+ ==&gt; "gi|115023|sp|P10425|"</pre>
420
+ <p>Get a partial sequence:</p>
421
+ <pre>bioruby&gt; seq.to_s[60..120]
422
+ ==&gt; "LGYFNG-EAVPSNGLVLNTSKGLVLVDSSWDNKLTKELIEMVEKKFQKRVTDVIITHAHAD"</pre>
423
+ <p>Show the full alignment residue match information for the sequences in the set:</p>
424
+ <pre>bioruby&gt; aln.match_line[60..120]
425
+ ==&gt; " . **. . .. ::*: . * : : . .: .* * *"</pre>
426
+ <p>Return a Bio::Alignment object:</p>
427
+ <pre>bioruby&gt; aln.alignment.consensus[60..120]
428
+ ==&gt; "???????????SN?????????????D??????????L??????????????????H?H?D"</pre>
420
429
  <h2><a name="label-7" id="label-7">Restriction Enzymes (Bio::RE)</a></h2><!-- RDLabel: "Restriction Enzymes (Bio::RE)" -->
421
430
  <p>BioRuby has extensive support for restriction enzymes (REs). It contains a full
422
431
  library of commonly used REs (from REBASE) which can be used to cut single
423
- stranded RNA or dubbel stranded DNA into fragments. To list all enzymes:</p>
432
+ stranded RNA or double stranded DNA into fragments. To list all enzymes:</p>
424
433
  <pre>rebase = Bio::RestrictionEnzyme.rebase
425
434
  rebase.each do |enzyme_name, info|
426
435
  p enzyme_name
427
436
  end</pre>
428
- <p>and cut a sequence with an enzyme follow up with:</p>
437
+ <p>and to cut a sequence with an enzyme follow up with:</p>
429
438
  <pre>res = seq.cut_with_enzyme('EcoRII', {:max_permutations =&gt; 0},
430
439
  {:view_ranges =&gt; true})
431
440
  if res.kind_of? Symbol #error
@@ -451,12 +460,12 @@ res.each do |frag|
451
460
  <p>Let's start with a query.pep file which contains a sequence in FASTA
452
461
  format. In this example we are going to execute a homology search
453
462
  from a remote internet site or on your local machine. Note that you
454
- can use the ssearch program instead of fasta when you use them in your
463
+ can use the ssearch program instead of fasta when you use it in your
455
464
  local machine.</p>
456
465
  <h3><a name="label-9" id="label-9">using FASTA in local machine</a></h3><!-- RDLabel: "using FASTA in local machine" -->
457
466
  <p>Install the fasta program on your machine (the command name looks like
458
- fasta34. FASTA can be downloaded from ftp://ftp.virginia.edu/pub/fasta/).
459
- First, you must prepare your FASTA-formatted database sequence file
467
+ fasta34. FASTA can be downloaded from ftp://ftp.virginia.edu/pub/fasta/).</p>
468
+ <p>First, you must prepare your FASTA-formatted database sequence file
460
469
  target.pep and FASTA-formatted query.pep. </p>
461
470
  <pre>#!/usr/bin/env ruby
462
471
 
@@ -489,19 +498,18 @@ ff.each do |entry|
489
498
  end
490
499
  end
491
500
  end</pre>
492
- <p>We named above script as f_search.rb. You can execute as follows:</p>
501
+ <p>We named above script f_search.rb. You can execute it as follows:</p>
493
502
  <pre>% ./f_search.rb query.pep target.pep &gt; f_search.out</pre>
494
503
  <p>In above script, the variable "factory" is a factory object for executing
495
504
  FASTA many times easily. Instead of using Fasta#query method,
496
505
  Bio::Sequence#fasta method can be used.</p>
497
506
  <pre>seq = "&gt;test seq\nYQVLEEIGRGSFGSVRKVIHIPTKKLLVRKDIKYGHMNSKE"
498
507
  seq.fasta(factory)</pre>
499
- <p>When you want to add options to FASTA command, you can set the
500
- third argument of Bio::Fasta.local method. For example, setting ktup to 1
501
- and getting top-10 hits:</p>
508
+ <p>When you want to add options to FASTA commands, you can set the
509
+ third argument of the Bio::Fasta.local method. For example, the following sets ktup to 1 and gets a list of the top 10 hits:</p>
502
510
  <pre>factory = Bio::Fasta.local('fasta34', 'target.pep', '-b 10')
503
511
  factory.ktup = 1</pre>
504
- <p>Bio::Fasta#query returns Bio::Fasta::Report object.
512
+ <p>Bio::Fasta#query returns a Bio::Fasta::Report object.
505
513
  We can get almost all information described in FASTA report text
506
514
  with the Report object. For example, getting information for hits:</p>
507
515
  <pre>report.each do |hit|
@@ -527,11 +535,10 @@ with the Report object. For example, getting information for hits:</p>
527
535
  # in hit(target) sequence
528
536
  puts hit.lap_at # array of above four numbers
529
537
  end</pre>
530
- <p>Most of above methods are common with the Bio::Blast::Report described
531
- below. Please refer to document of Bio::Fasta::Report class for
538
+ <p>Most of above methods are common to the Bio::Blast::Report described
539
+ below. Please refer to the documentation of the Bio::Fasta::Report class for
532
540
  FASTA-specific details.</p>
533
- <p>If you need original output text of FASTA program you can use the "output"
534
- method of the factory object after the "query" method.</p>
541
+ <p>If you need the original output text of FASTA program you can use the "output" method of the factory object after the "query" method.</p>
535
542
  <pre>report = factory.query(entry)
536
543
  puts factory.output</pre>
537
544
  <h3><a name="label-10" id="label-10">using FASTA from a remote internet site</a></h3><!-- RDLabel: "using FASTA from a remote internet site" -->
@@ -558,7 +565,7 @@ same things as with a local method.</p>
558
565
  <p>Select the databases you require. Next, give the search program from
559
566
  the type of query sequence and database.</p>
560
567
  <ul>
561
- <li>When query is a amino acid sequence
568
+ <li>When query is an amino acid sequence
562
569
  <ul>
563
570
  <li>When protein database, program is "fasta".</li>
564
571
  <li>When nucleic database, program is "tfasta".</li>
@@ -566,10 +573,10 @@ the type of query sequence and database.</p>
566
573
  <li>When query is a nucleic acid sequence
567
574
  <ul>
568
575
  <li>When nucleic database, program is "fasta".</li>
569
- <li>(When protein database, you would fail to search.)</li>
576
+ <li>(When protein database, the search would fail.)</li>
570
577
  </ul></li>
571
578
  </ul>
572
- <p>For example:</p>
579
+ <p>For example, run:</p>
573
580
  <pre>program = 'fasta'
574
581
  database = 'genes'
575
582
 
@@ -600,7 +607,7 @@ The parameter "program" is different from FASTA - as you can expect:</p>
600
607
  <p>Bio::BLAST uses "-m 7" XML output of BLAST by default when either
601
608
  XMLParser or REXML (both of them are XML parser libraries for Ruby -
602
609
  of the two XMLParser is the fastest) is installed on your computer. In
603
- Ruby version 1.8.0, or later, REXML is bundled with Ruby's
610
+ Ruby version 1.8.0 or later, REXML is bundled with Ruby's
604
611
  distribution.</p>
605
612
  <p>When no XML parser library is present, Bio::BLAST uses "-m 8" tabular
606
613
  deliminated format. Available information is limited with the
@@ -631,9 +638,9 @@ midline.</p>
631
638
  puts hit.lap_at
632
639
  end</pre>
633
640
  <p>For simplicity and API compatibility, some information such as score
634
- are extracted from the first Hsp (High-scoring Segment Pair).</p>
641
+ is extracted from the first Hsp (High-scoring Segment Pair).</p>
635
642
  <p>Check the documentation for Bio::Blast::Report to see what can be
636
- retrieved. For now suffice to state that Bio::Blast::Report has a
643
+ retrieved. For now suffice to say that Bio::Blast::Report has a
637
644
  hierarchical structure mirroring the general BLAST output stream:</p>
638
645
  <ul>
639
646
  <li>In a Bio::Blast::Report object, @iterations is an array of
@@ -699,11 +706,10 @@ want to add other sites, you must write the following:</p>
699
706
  named "exec_MYSITE" to get query sequence and to pass the result to
700
707
  Bio::Blast::Report.new(or Bio::Blast::Default::Report.new):</p>
701
708
  <pre>factory = Bio::Blast.remote(program, db, option, 'MYSITE')</pre>
702
- <p>When you write above routines, please send to the BioRuby project and
703
- they may be included.</p>
709
+ <p>When you write above routines, please send them to the BioRuby project, and they may be included in future releases.</p>
704
710
  <h2><a name="label-14" id="label-14">Generate a reference list using PubMed (Bio::PubMed)</a></h2><!-- RDLabel: "Generate a reference list using PubMed (Bio::PubMed)" -->
705
711
  <p>Nowadays using NCBI E-Utils is recommended. Use Bio::PubMed.esearch
706
- and Bio::PubMed.efetch instead of above methods.</p>
712
+ and Bio::PubMed.efetch.</p>
707
713
  <pre>#!/usr/bin/env ruby
708
714
 
709
715
  require 'bio'
@@ -741,7 +747,7 @@ BibTeX format bibliography data to a file named genoinfo.bib.</p>
741
747
  % ./pmsearch.rb genome bioinformatics &gt;&gt; genoinfo.bib</pre>
742
748
  <p>The BibTeX can be used with Tex or LaTeX to form bibliography
743
749
  information with your journal article. For more information
744
- on BibTex see (EDITORS NOTE: insert URL). A quick example:</p>
750
+ on using BibTex see <a href="http://www.bibtex.org/Using/">BibTex HowTo site</a>. A quick example:</p>
745
751
  <p>Save this to hoge.tex:</p>
746
752
  <pre>\documentclass{jarticle}
747
753
  \begin{document}
@@ -754,12 +760,11 @@ foo bar KEGG database~\cite{PMID:10592173} baz hoge fuga.
754
760
  % bibtex hoge # processes genoinfo.bib
755
761
  % latex hoge # creates bibliography list
756
762
  % latex hoge # inserts correct bibliography reference</pre>
757
- <p>Now, you get hoge.dvi and hoge.ps - the latter you can view any
758
- Postscript viewer.</p>
763
+ <p>Now, you get hoge.dvi and hoge.ps - the latter of which can be viewed with any Postscript viewer.</p>
759
764
  <h3><a name="label-16" id="label-16">Bio::Reference#bibitem</a></h3><!-- RDLabel: "Bio::Reference#bibitem" -->
760
765
  <p>When you don't want to create a bib file, you can use
761
766
  Bio::Reference#bibitem method instead of Bio::Reference#bibtex.
762
- In above pmfetch.rb and pmsearch.rb scripts, change</p>
767
+ In the above pmfetch.rb and pmsearch.rb scripts, change</p>
763
768
  <pre>puts reference.bibtex</pre>
764
769
  <p>to</p>
765
770
  <pre>puts reference.bibitem</pre>
@@ -801,12 +806,12 @@ BioRuby and other projects' members (2002).</p>
801
806
  </ul></li>
802
807
  <li>BioSQL
803
808
  <ul>
804
- <li>Schemas to store sequence data to relational database such as
809
+ <li>Schemas to store sequence data to relational databases such as
805
810
  MySQL and PostgreSQL, and methods to retrieve entries from the database.</li>
806
811
  </ul></li>
807
812
  </ul>
808
- <p>Here we give a quick overview. Check out
809
- <a href="http://obda.open-bio.org/">&lt;URL:http://obda.open-bio.org/&gt;</a> for more extensive details.</p>
813
+ <p>This tutorial only gives a quick overview of OBDA. Check out
814
+ <a href="http://obda.open-bio.org">the OBDA site</a> for more extensive details.</p>
810
815
  <h2><a name="label-18" id="label-18">BioRegistry</a></h2><!-- RDLabel: "BioRegistry" -->
811
816
  <p>BioRegistry allows for locating retrieval methods and database
812
817
  locations through configuration files. The priorities are</p>
@@ -821,14 +826,14 @@ when all local configulation files are not available.</p>
821
826
  <p>In the current BioRuby implementation all local configulation files
822
827
  are read. For databases with the same name settings encountered first
823
828
  are used. This means that if you don't like some settings of a
824
- database in system global configuration file
825
- (/etc/bioinformatics/seqdatabase.ini), you can easily override it by
829
+ database in the system's global configuration file
830
+ (/etc/bioinformatics/seqdatabase.ini), you can easily override them by
826
831
  writing settings to ~/.bioinformatics/seqdatabase.ini.</p>
827
832
  <p>The syntax of the configuration file is called a stanza format. For example</p>
828
833
  <pre>[DatabaseName]
829
834
  protocol=ProtocolName
830
- location=ServeName</pre>
831
- <p>You can write a description like above entry for every database.</p>
835
+ location=ServerName</pre>
836
+ <p>You can write a description like the above entry for every database.</p>
832
837
  <p>The database name is a local label for yourself, so you can name it
833
838
  freely and it can differ from the name of the actual databases. In the
834
839
  actual specification of BioRegistry where there are two or more
@@ -836,8 +841,8 @@ settings for a database of the same name, it is proposed that
836
841
  connection to the database is tried sequentially with the order
837
842
  written in configuration files. However, this has not (yet) been
838
843
  implemented in BioRuby.</p>
839
- <p>In addition, for some protocol, you must set additional options
840
- other than locations (e.g. user name of MySQL). In the BioRegistory
844
+ <p>In addition, for some protocols, you must set additional options
845
+ other than locations (e.g. user name for MySQL). In the BioRegistory
841
846
  specification, current available protocols are:</p>
842
847
  <ul>
843
848
  <li>index-flat</li>
@@ -850,8 +855,7 @@ specification, current available protocols are:</p>
850
855
  <p>In BioRuby, you can use index-flat, index-berkleydb, biofetch and biosql.
851
856
  Note that the BioRegistry specification sometimes gets updated and BioRuby
852
857
  does not always follow quickly.</p>
853
- <p>Here an example. Create a Bio::Registry object. It reads the configuration
854
- files:</p>
858
+ <p>Here is an example. It creates a Bio::Registry object and reads the configuration files:</p>
855
859
  <pre>reg = Bio::Registry.new
856
860
 
857
861
  # connects to the database "genbank"
@@ -859,42 +863,39 @@ serv = reg.get_database('genbank')
859
863
 
860
864
  # gets entry of the ID
861
865
  entry = serv.get_by_id('AA2CG')</pre>
862
- <p>The variable "serv" is a server object corresponding to the setting
863
- written in configuration files. The class of the object is one of
866
+ <p>The variable "serv" is a server object corresponding to the settings
867
+ written in the configuration files. The class of the object is one of
864
868
  Bio::SQL, Bio::Fetch, and so on. Note that Bio::Registry#get_database("name")
865
869
  returns nil if no database is found.</p>
866
- <p>After that, you can use get_by_id method and some specific methods.
867
- Please refer to below documents.</p>
870
+ <p>After that, you can use the get_by_id method and some specific methods.
871
+ Please refer to the sections below for more information.</p>
868
872
  <h2><a name="label-19" id="label-19">BioFlat</a></h2><!-- RDLabel: "BioFlat" -->
869
873
  <p>BioFlat is a mechanism to create index files of flat files and to retrieve
870
874
  these entries fast. There are two index types. index-flat is a simple index
871
- performing binary search without using an external library of Ruby. index-berkeleydb
875
+ performing binary search without using any external libraries of Ruby. index-berkeleydb
872
876
  uses Berkeley DB for indexing - but requires installing bdb on your computer,
873
- as well as the BDB Ruby package. For creating the index itself, you can use
874
- br_bioflat.rb command bundled with BioRuby.</p>
877
+ as well as the BDB Ruby package. To create the index itself, you can use br_bioflat.rb command bundled with BioRuby.</p>
875
878
  <pre>% br_bioflat.rb --makeindex database_name [--format data_format] filename...</pre>
876
879
  <p>The format can be omitted because BioRuby has autodetection. If that
877
- does not work you can try specifying data format as a name of BioRuby
878
- database class.</p>
880
+ doesn't work, you can try specifying the data format as the name of a BioRuby database class.</p>
879
881
  <p>Search and retrieve data from database:</p>
880
882
  <pre>% br_bioflat.rb database_name identifier</pre>
881
- <p>For example, to create index of GenBank files gbbct*.seq and get entry
882
- from the database:</p>
883
+ <p>For example, to create an index of GenBank files gbbct*.seq and get the entry from the database:</p>
883
884
  <pre>% br_bioflat.rb --makeindex my_bctdb --format GenBank gbbct*.seq
884
885
  % br_bioflat.rb my_bctdb A16STM262</pre>
885
886
  <p>If you have Berkeley DB on your system and installed the bdb extension
886
- module of Ruby (see http://raa.ruby-lang.org/project/bdb/), you can
887
+ module of Ruby (see <a href="http://raa.ruby-lang.org/project/bdb/">the BDB project page</a> ), you can
887
888
  create and search indexes with Berkeley DB - a very fast alternative
888
889
  that uses little computer memory. When creating the index, use the
889
890
  "--makeindex-bdb" option instead of "--makeindex".</p>
890
891
  <pre>% br_bioflat.rb --makeindex-bdb database_name [--format data_format] filename...</pre>
891
892
  <h2><a name="label-20" id="label-20">BioFetch</a></h2><!-- RDLabel: "BioFetch" -->
892
893
  <pre>Note: this section is an advanced topic</pre>
893
- <p>BioFetch is a database retrieval mechanism via CGI. CGI Parameters,
894
- options and error codes are standardized. There client access via
894
+ <p>BioFetch is a database retrieval mechanism via CGI. CGI Parameters,
895
+ options and error codes are standardized. Client access via
895
896
  http is possible giving the database name, identifiers and format to
896
897
  retrieve entries.</p>
897
- <p>The BioRuby project has a BioFetch server in bioruby.org. It uses
898
+ <p>The BioRuby project has a BioFetch server at bioruby.org. It uses
898
899
  GenomeNet's DBGET system as a backend. The source code of the
899
900
  server is in sample/ directory. Currently, there are only two
900
901
  BioFetch servers in the world: bioruby.org and EBI.</p>
@@ -912,18 +913,18 @@ entry = serv.fetch(db_name, entry_id)</pre></li>
912
913
  serv = reg.get_database('genbank')
913
914
  entry = serv.get_by_id('AA2CG')</pre></li>
914
915
  </ol>
915
- <p>If you want to use (4), you, obviously, have to include some settings
916
- in seqdatabase.ini. E.g.</p>
916
+ <p>If you want to use (4), you have to include some settings
917
+ in seqdatabase.ini. For example:</p>
917
918
  <pre>[genbank]
918
919
  protocol=biofetch
919
920
  location=http://bioruby.org/cgi-bin/biofetch.rb
920
921
  biodbname=genbank</pre>
921
922
  <h3><a name="label-21" id="label-21">The combination of BioFetch, Bio::KEGG::GENES and Bio::AAindex1</a></h3><!-- RDLabel: "The combination of BioFetch, Bio::KEGG::GENES and Bio::AAindex1" -->
922
- <p>Bioinformatics is often about glueing things together. Here we give an
923
- example to get the bacteriorhodopsin gene (VNG1467G) of the archaea
924
- Halobacterium from KEGG GENES database and to get alpha-helix index
923
+ <p>Bioinformatics is often about gluing things together. Here is an
924
+ example that gets the bacteriorhodopsin gene (VNG1467G) of the archaea
925
+ Halobacterium from KEGG GENES database and gets alpha-helix index
925
926
  data (BURA740101) from the AAindex (Amino acid indices and similarity
926
- matrices) database, and show the helix score for each 15-aa length
927
+ matrices) database, and shows the helix score for each 15-aa length
927
928
  overlapping window.</p>
928
929
  <pre>#!/usr/bin/env ruby
929
930
 
@@ -943,14 +944,14 @@ aaseq.window_search(win_size) do |subseq|
943
944
  puts [ position, score ].join("\t")
944
945
  position += 1
945
946
  end</pre>
946
- <p>The special method Bio::Fetch.query uses preset BioFetch server
947
- in bioruby.org. (The server internally get data from GenomeNet.
947
+ <p>The special method Bio::Fetch.query uses the preset BioFetch server
948
+ at bioruby.org. (The server internally gets data from GenomeNet.
948
949
  Because the KEGG/GENES database and AAindex database are not available
949
- from other BioFetch servers, we used bioruby.org server with
950
+ from other BioFetch servers, we used the bioruby.org server with
950
951
  Bio::Fetch.query method.)</p>
951
952
  <h2><a name="label-22" id="label-22">BioSQL</a></h2><!-- RDLabel: "BioSQL" -->
952
- <p>BioSQL is a well known schema to store and retrive biological sequences using a RDBMS like PostgreSQL or MySQL; note that SQLite is not supported.
953
- First of all, you must install a database engine or have access to a remote one. Then create the schema and populate with the taxonomy. You can follow the <a href="http://code.open-bio.org/svnweb/index.cgi/biosql/view/biosql-schema/trunk/INSTALL">Official Guide</a> .
953
+ <p>BioSQL is a well known schema to store and retrive biological sequences using a RDBMS like PostgreSQL or MySQL: note that SQLite is not supported.
954
+ First of all, you must install a database engine or have access to a remote one. Then create the schema and populate with the taxonomy. You can follow the <a href="http://code.open-bio.org/svnweb/index.cgi/biosql/view/biosql-schema/trunk/INSTALL">Official Guide</a> to accomplish these steps.
954
955
  Next step is to install these gems:</p>
955
956
  <ul>
956
957
  <li>ActiveRecord</li>
@@ -958,21 +959,22 @@ Next step is to install these gems:</p>
958
959
  <li>The layer to comunicate with you preferred RDBMS (postgresql, mysql, jdbcmysql in case you are running JRuby )</li>
959
960
  </ul>
960
961
  <p>You can find ActiveRecord's models in /bioruby/lib/bio/io/biosql</p>
961
- <p>When you have your database up and running, you can connect to it in this way:</p>
962
+ <p>When you have your database up and running, you can connect to it like this:</p>
962
963
  <pre>#!/usr/bin/env ruby
963
964
 
964
965
  require 'bio'
965
966
 
966
967
  connection = Bio::SQL.establish_connection({'development'=&gt;{'hostname'=&gt;"YourHostname",
967
- 'database'=&gt;"CoolBioSeqDB",
968
- 'adapter'=&gt;"jdbcmysql",
969
- 'username'=&gt;"YourUser",
970
- 'password'=&gt;"YouPassword"
971
- }
972
- },
973
- 'development')
968
+ 'database'=&gt;"CoolBioSeqDB",
969
+ 'adapter'=&gt;"jdbcmysql",
970
+ 'username'=&gt;"YourUser",
971
+ 'password'=&gt;"YouPassword"
972
+ }
973
+ },
974
+ 'development')
974
975
 
975
- #The first parameter is the hash contaning the description of the configuration similar to database.yml in Rails application, you can declare different environment. The second parameter is the environment to use: 'development', 'test', 'production'.
976
+ #The first parameter is the hash contaning the description of the configuration; similar to database.yml in Rails applications, you can declare different environment.
977
+ #The second parameter is the environment to use: 'development', 'test', or 'production'.
976
978
 
977
979
  #To store a sequence into the database you simply need a biosequence object.
978
980
  biosql_database = Bio::SQL::Biodatabase.find(:first)
@@ -991,27 +993,28 @@ Bio::SQL.list_databases
991
993
  #retriving a generic accession
992
994
  bioseq = Bio::SQL.fetch_accession("YouAccession")
993
995
 
994
- #If you use biosequence objects, you will find all its method mapped to BioSQL sequences. But you can also access to the models directly:
996
+ #If you use biosequence objects, you will find all its method mapped to BioSQL sequences.
997
+ #But you can also access to the models directly:
995
998
 
996
- #get the raw sequence associated with you accession
999
+ #get the raw sequence associated with your accession
997
1000
  bioseq.entry.biosequence
998
1001
 
999
- #get the length of your sequence, this is the explicit form of bioseq.length
1002
+ #get the length of your sequence; this is the explicit form of bioseq.length
1000
1003
  bioseq.entry.biosequence.length
1001
1004
 
1002
- #convert the sequence in GenBank format
1005
+ #convert the sequence into GenBank format
1003
1006
  bioseq.to_biosequence.output(:genbank)</pre>
1004
- <p>BioSQL' <a href="http://www.biosql.org/wiki/Schema_Overview">schema</a> is not so intuitive at the beginning, spend some time on understanding it, in the end if you know a little bit of rails everything will go smootly. You can find information to Annotation <a href="http://www.biosql.org/wiki/Annotation_Mapping">here</a>
1007
+ <p>BioSQL's <a href="http://www.biosql.org/wiki/Schema_Overview">schema</a> is not very intuitive for beginners, so spend some time on understanding it. In the end if you know a little bit of Ruby on Rails, everything will go smoothly. You can find information on Annotation <a href="http://www.biosql.org/wiki/Annotation_Mapping">here</a>.
1005
1008
  ToDo: add exemaples from George. I remember he did some cool post on BioSQL and Rails.</p>
1006
1009
  <h1><a name="label-23" id="label-23">PhyloXML</a></h1><!-- RDLabel: "PhyloXML" -->
1007
1010
  <p>PhyloXML is an XML language for saving, analyzing and exchanging data of
1008
- annotated phylogenetic trees. PhyloXML parser in BioRuby is implemented in
1009
- Bio::PhyloXML::Parser and writer in Bio::PhyloXML::Writer.
1010
- More information at www.phyloxml.org</p>
1011
+ annotated phylogenetic trees. PhyloXML's parser in BioRuby is implemented in
1012
+ Bio::PhyloXML::Parser, and its writer in Bio::PhyloXML::Writer.
1013
+ More information can be found at <a href="http://www.phyloxml.org">www.phyloxml.org</a>.</p>
1011
1014
  <h2><a name="label-24" id="label-24">Requirements</a></h2><!-- RDLabel: "Requirements" -->
1012
- <p>In addition to BioRuby library you need a libxml ruby bindings. To install:</p>
1015
+ <p>In addition to BioRuby, you need the libxml Ruby bindings. To install, execute:</p>
1013
1016
  <pre>% gem install -r libxml-ruby</pre>
1014
- <p>For more information see <a href="http://libxml.rubyforge.org/install.xml">&lt;URL:http://libxml.rubyforge.org/install.xml&gt;</a></p>
1017
+ <p>For more information see the <a href="http://libxml.rubyforge.org/install.xml">libxml installer page</a></p>
1015
1018
  <h2><a name="label-25" id="label-25">Parsing a file</a></h2><!-- RDLabel: "Parsing a file" -->
1016
1019
  <pre>require 'bio'
1017
1020
 
@@ -1022,9 +1025,9 @@ phyloxml = Bio::PhyloXML::Parser.open('example.xml')
1022
1025
  phyloxml.each do |tree|
1023
1026
  puts tree.name
1024
1027
  end</pre>
1025
- <p>If there are several trees in the file, you can access the one you wish by an index</p>
1028
+ <p>If there are several trees in the file, you can access the one you wish by specifying its index:</p>
1026
1029
  <pre>tree = phyloxml[3]</pre>
1027
- <p>You can use all Bio::Tree methods on the tree, since PhyloXML::Tree inherits from Bio::Tree. For example,</p>
1030
+ <p>You can use all Bio::Tree methods on the tree, since PhyloXML::Tree inherits from Bio::Tree. For example, </p>
1028
1031
  <pre>tree.leaves.each do |node|
1029
1032
  puts node.name
1030
1033
  end</pre>
@@ -1045,7 +1048,7 @@ writer.write(tree1)
1045
1048
  # Add another tree to the file
1046
1049
  writer.write(tree2)</pre>
1047
1050
  <h2><a name="label-27" id="label-27">Retrieving data</a></h2><!-- RDLabel: "Retrieving data" -->
1048
- <p>Here is an example of how to retrieve the scientific name of the clades.</p>
1051
+ <p>Here is an example of how to retrieve the scientific name of the clades included in each tree.</p>
1049
1052
  <pre>require 'bio'
1050
1053
 
1051
1054
  phyloxml = Bio::PhyloXML::Parser.open('ncbi_taxonomy_mollusca.xml')
@@ -1086,7 +1089,7 @@ end
1086
1089
  #aggtcgcggcctgtggaagtcctctcct
1087
1090
  #taaatcgc--cccgtgg-agtccc-cct</pre>
1088
1091
  <h2><a name="label-29" id="label-29">The BioRuby example programs</a></h2><!-- RDLabel: "The BioRuby example programs" -->
1089
- <p>Some sample programs are stored in ./samples/ directory. Run for example:</p>
1092
+ <p>Some sample programs are stored in ./samples/ directory. For example, the n2aa.rb program (transforms a nucleic acid sequence into an amino acid sequence) can be run using:</p>
1090
1093
  <pre>./sample/na2aa.rb test/data/fasta/example1.txt </pre>
1091
1094
  <h2><a name="label-30" id="label-30">Unit testing and doctests</a></h2><!-- RDLabel: "Unit testing and doctests" -->
1092
1095
  <p>BioRuby comes with an extensive testing framework with over 1300 tests and 2700
@@ -1098,23 +1101,23 @@ in this tutorial to doctest - more info upcoming.</p>
1098
1101
  <h2><a name="label-31" id="label-31">Further reading</a></h2><!-- RDLabel: "Further reading" -->
1099
1102
  <p>See the BioRuby in anger Wiki. A lot of BioRuby's documentation exists in the
1100
1103
  source code and unit tests. To really dive in you will need the latest source
1101
- code tree. The embedded rdoc documentation can be viewed online at
1104
+ code tree. The embedded rdoc documentation for the BioRuby source code can be viewed online at
1102
1105
  <a href="http://bioruby.org/rdoc/">&lt;URL:http://bioruby.org/rdoc/&gt;</a>.</p>
1103
1106
  <h2><a name="label-32" id="label-32">BioRuby Shell</a></h2><!-- RDLabel: "BioRuby Shell" -->
1104
- <p>The BioRuby shell implementation you find in ./lib/bio/shell. It is very interesting
1107
+ <p>The BioRuby shell implementation is located in ./lib/bio/shell. It is very interesting
1105
1108
  as it uses IRB (the Ruby intepreter) which is a powerful environment described in
1106
- <a href="http://ruby-doc.org/docs/ProgrammingRuby/html/irb.html">Programming Ruby's irb chapter</a>. IRB commands can directly be typed in the shell, e.g.</p>
1109
+ <a href="http://ruby-doc.org/docs/ProgrammingRuby/html/irb.html">Programming Ruby's IRB chapter</a>. IRB commands can be typed directly into the shell, e.g.</p>
1107
1110
  <pre>bioruby!&gt; IRB.conf[:PROMPT_MODE]
1108
1111
  ==!&gt; :PROMPT_C</pre>
1109
- <p>optionally you also may want to install the optional Ruby readline support -
1112
+ <p>Additionally, you also may want to install the optional Ruby readline support -
1110
1113
  with Debian libreadline-ruby. To edit a previous line you may have to press
1111
- line down (arrow down) first.</p>
1114
+ line down (down arrow) first.</p>
1112
1115
  <h1><a name="label-33" id="label-33">Helpful tools</a></h1><!-- RDLabel: "Helpful tools" -->
1113
1116
  <p>Apart from rdoc you may also want to use rtags - which allows jumping around
1114
1117
  source code by clicking on class and method names. </p>
1115
1118
  <pre>cd bioruby/lib
1116
1119
  rtags -R --vi</pre>
1117
- <p>For a tutorial see <a href="http://rtags.rubyforge.org/">&lt;URL:http://rtags.rubyforge.org/&gt;</a></p>
1120
+ <p>For a tutorial see <a href="http://rtags.rubyforge.org/">here</a></p>
1118
1121
  <h1><a name="label-34" id="label-34">APPENDIX</a></h1><!-- RDLabel: "APPENDIX" -->
1119
1122
  <h2><a name="label-35" id="label-35">KEGG API</a></h2><!-- RDLabel: "KEGG API" -->
1120
1123
  <p>Please refer to KEGG_API.rd.ja (English version: <a href="http://www.genome.jp/kegg/soap/doc/keggapi_manual.html">&lt;URL:http://www.genome.jp/kegg/soap/doc/keggapi_manual.html&gt;</a> ) and</p>
@@ -1122,9 +1125,9 @@ rtags -R --vi</pre>
1122
1125
  <li><a href="http://www.genome.jp/kegg/soap/">&lt;URL:http://www.genome.jp/kegg/soap/&gt;</a></li>
1123
1126
  </ul>
1124
1127
  <h2><a name="label-36" id="label-36">Ruby Ensembl API</a></h2><!-- RDLabel: "Ruby Ensembl API" -->
1125
- <p>Ruby Ensembl API is a ruby API to the Ensembl database. It is NOT currently
1128
+ <p>The Ruby Ensembl API is a Ruby API to the Ensembl database. It is NOT currently
1126
1129
  included in the BioRuby archives. To install it, see
1127
- <a href="http://wiki.github.com/jandot/ruby-ensembl-api">&lt;URL:http://wiki.github.com/jandot/ruby-ensembl-api&gt;</a>
1130
+ <a href="http://wiki.github.com/jandot/ruby-ensembl-api">the Ruby-Ensembl Github</a>
1128
1131
  for more information.</p>
1129
1132
  <h3><a name="label-37" id="label-37">Gene Ontology (GO) through the Ruby Ensembl API</a></h3><!-- RDLabel: "Gene Ontology (GO) through the Ruby Ensembl API" -->
1130
1133
  <p>Gene Ontologies can be fetched through the Ruby Ensembl API package:</p>
@@ -1134,7 +1137,7 @@ infile = IO.readlines(ARGV.shift) # reading your comma-separated accession mappi
1134
1137
  infile.each do |line|
1135
1138
  accs = line.split(",") # Split the comma-sep.entries into an array
1136
1139
  drosphila_acc = accs.shift # the first entry is the Drosophila acc
1137
- mosq_acc = accs.shift # the second entry is you Mosq. acc
1140
+ mosq_acc = accs.shift # the second entry is your Mosq. acc
1138
1141
  gene = Ensembl::Core::Gene.find_by_stable_id(drosophila_acc)
1139
1142
  print "#{mosq_acc}"
1140
1143
  gene.go_terms.each do |go|
@@ -1145,9 +1148,9 @@ end</pre>
1145
1148
  homologues.</p>
1146
1149
  <h2><a name="label-38" id="label-38">Using BioPerl or BioPython from Ruby</a></h2><!-- RDLabel: "Using BioPerl or BioPython from Ruby" -->
1147
1150
  <p>At the moment there is no easy way of accessing BioPerl from Ruby. The best way, perhaps, is to create a Perl server that gets accessed through XML/RPC or SOAP.</p>
1148
- <h2><a name="label-39" id="label-39">Installing required external library</a></h2><!-- RDLabel: "Installing required external library" -->
1151
+ <h2><a name="label-39" id="label-39">Installing required external libraries</a></h2><!-- RDLabel: "Installing required external libraries" -->
1149
1152
  <p>At this point for using BioRuby no additional libraries are needed, except if
1150
- you are using Bio::PhyloXML module. Then you have to install libxml-ruby.</p>
1153
+ you are using the Bio::PhyloXML module; then you have to install libxml-ruby.</p>
1151
1154
  <p>This may change, so keep an eye on the Bioruby website. Also when
1152
1155
  a package is missing BioRuby should show an informative message.</p>
1153
1156
  <p>At this point installing third party Ruby packages can be a bit
@@ -1155,16 +1158,16 @@ painful, as the gem standard for packages evolved late and some still
1155
1158
  force you to copy things by hand. Therefore read the README's
1156
1159
  carefully that come with each package.</p>
1157
1160
  <h3><a name="label-40" id="label-40">Installing libxml-ruby</a></h3><!-- RDLabel: "Installing libxml-ruby" -->
1158
- <p>The simplest way is to use gem packaging system.</p>
1161
+ <p>The simplest way is to use the RubyGems packaging system:</p>
1159
1162
  <pre>gem install -r libxml-ruby</pre>
1160
1163
  <p>If you get `require': no such file to load - mkmf (LoadError) error then do</p>
1161
1164
  <pre>sudo apt-get install ruby-dev</pre>
1162
- <p>If you have other problems with installation, then see <a href="http://libxml.rubyforge.org/install.xml">&lt;URL:http://libxml.rubyforge.org/install.xml&gt;</a> </p>
1165
+ <p>If you have other problems with installation, then see <a href="http://libxml.rubyforge.org/install.xml">&lt;URL:http://libxml.rubyforge.org/install.xml&gt;</a>.</p>
1163
1166
  <h2><a name="label-41" id="label-41">Trouble shooting</a></h2><!-- RDLabel: "Trouble shooting" -->
1164
1167
  <ul>
1165
1168
  <li>Error: in `require': no such file to load -- bio (LoadError)</li>
1166
1169
  </ul>
1167
- <p>Ruby fails to find the BioRuby libraries - add it to the RUBYLIB path, or pass
1170
+ <p>Ruby is failing to find the BioRuby libraries - add it to the RUBYLIB path, or pass
1168
1171
  it to the interpeter. For example:</p>
1169
1172
  <pre>ruby -I$BIORUBYPATH/lib yourprogram.rb</pre>
1170
1173
  <h2><a name="label-42" id="label-42">Modifying this page</a></h2><!-- RDLabel: "Modifying this page" -->