RubyGems - bio - Versions diffs - 1.3.1 → 1.4.0 - Mend

bio 1.3.1 → 1.4.0

Files changed (303) hide show

data/ChangeLog +2105 -3728
data/KNOWN_ISSUES.rdoc +35 -3
data/README.rdoc +8 -2
data/RELEASE_NOTES.rdoc +166 -0
data/bin/bioruby +4 -1
data/bioruby.gemspec +146 -1
data/bioruby.gemspec.erb +3 -1
data/doc/ChangeLog-before-1.3.1 +3961 -0
data/doc/Tutorial.rd +154 -22
data/doc/Tutorial.rd.html +125 -68
data/lib/bio.rb +21 -6
data/lib/bio/appl/bl2seq/report.rb +11 -202
data/lib/bio/appl/blast/format0.rb +0 -193
data/lib/bio/appl/blast/report.rb +2 -147
data/lib/bio/appl/blast/wublast.rb +0 -208
data/lib/bio/appl/fasta.rb +4 -19
data/lib/bio/appl/fasta/format10.rb +0 -14
data/lib/bio/appl/genscan/report.rb +0 -176
data/lib/bio/appl/hmmer.rb +1 -15
data/lib/bio/appl/hmmer/report.rb +0 -100
data/lib/bio/appl/meme/mast.rb +156 -0
data/lib/bio/appl/meme/mast/report.rb +91 -0
data/lib/bio/appl/meme/motif.rb +48 -0
data/lib/bio/appl/psort.rb +0 -111
data/lib/bio/appl/psort/report.rb +1 -45
data/lib/bio/appl/pts1.rb +2 -4
data/lib/bio/appl/sosui/report.rb +5 -54
data/lib/bio/appl/targetp/report.rb +1 -104
data/lib/bio/appl/tmhmm/report.rb +0 -36
data/lib/bio/command.rb +94 -10
data/lib/bio/data/aa.rb +1 -77
data/lib/bio/data/codontable.rb +1 -95
data/lib/bio/data/na.rb +1 -26
data/lib/bio/db/aaindex.rb +1 -38
data/lib/bio/db/fasta.rb +1 -134
data/lib/bio/db/fasta/format_qual.rb +204 -0
data/lib/bio/db/fasta/qual.rb +102 -0
data/lib/bio/db/fastq.rb +645 -0
data/lib/bio/db/fastq/fastq_to_biosequence.rb +40 -0
data/lib/bio/db/fastq/format_fastq.rb +175 -0
data/lib/bio/db/genbank/genbank.rb +1 -86
data/lib/bio/db/gff.rb +0 -17
data/lib/bio/db/go.rb +4 -72
data/lib/bio/db/kegg/common.rb +112 -0
data/lib/bio/db/kegg/compound.rb +29 -20
data/lib/bio/db/kegg/drug.rb +74 -34
data/lib/bio/db/kegg/enzyme.rb +26 -5
data/lib/bio/db/kegg/genes.rb +128 -15
data/lib/bio/db/kegg/genome.rb +3 -41
data/lib/bio/db/kegg/glycan.rb +19 -24
data/lib/bio/db/kegg/orthology.rb +16 -56
data/lib/bio/db/kegg/reaction.rb +81 -28
data/lib/bio/db/kegg/taxonomy.rb +1 -52
data/lib/bio/db/litdb.rb +1 -16
data/lib/bio/db/phyloxml/phyloxml.xsd +582 -0
data/lib/bio/db/phyloxml/phyloxml_elements.rb +1174 -0
data/lib/bio/db/phyloxml/phyloxml_parser.rb +954 -0
data/lib/bio/db/phyloxml/phyloxml_writer.rb +228 -0
data/lib/bio/db/prosite.rb +2 -95
data/lib/bio/db/rebase.rb +5 -6
data/lib/bio/db/sanger_chromatogram/abif.rb +120 -0
data/lib/bio/db/sanger_chromatogram/chromatogram.rb +133 -0
data/lib/bio/db/sanger_chromatogram/chromatogram_to_biosequence.rb +32 -0
data/lib/bio/db/sanger_chromatogram/scf.rb +210 -0
data/lib/bio/io/das.rb +0 -44
data/lib/bio/io/ddbjxml.rb +1 -181
data/lib/bio/io/flatfile.rb +1 -7
data/lib/bio/io/flatfile/autodetection.rb +6 -0
data/lib/bio/io/keggapi.rb +0 -442
data/lib/bio/io/ncbirest.rb +130 -132
data/lib/bio/io/ncbisoap.rb +2 -1
data/lib/bio/io/pubmed.rb +0 -88
data/lib/bio/location.rb +0 -73
data/lib/bio/pathway.rb +0 -171
data/lib/bio/sequence.rb +18 -1
data/lib/bio/sequence/adapter.rb +3 -0
data/lib/bio/sequence/format.rb +16 -0
data/lib/bio/sequence/quality_score.rb +205 -0
data/lib/bio/tree.rb +70 -5
data/lib/bio/util/restriction_enzyme/single_strand.rb +3 -2
data/lib/bio/util/sirna.rb +1 -23
data/lib/bio/version.rb +1 -1
data/sample/demo_aaindex.rb +67 -0
data/sample/demo_aminoacid.rb +101 -0
data/sample/demo_bl2seq_report.rb +220 -0
data/sample/demo_blast_report.rb +285 -0
data/sample/demo_codontable.rb +119 -0
data/sample/demo_das.rb +105 -0
data/sample/demo_ddbjxml.rb +212 -0
data/sample/demo_fasta_remote.rb +51 -0
data/sample/demo_fastaformat.rb +105 -0
data/sample/demo_genbank.rb +132 -0
data/sample/demo_genscan_report.rb +202 -0
data/sample/demo_gff1.rb +49 -0
data/sample/demo_go.rb +98 -0
data/sample/demo_hmmer_report.rb +149 -0
data/sample/demo_kegg_compound.rb +57 -0
data/sample/demo_kegg_drug.rb +65 -0
data/sample/demo_kegg_genome.rb +74 -0
data/sample/demo_kegg_glycan.rb +72 -0
data/sample/demo_kegg_orthology.rb +62 -0
data/sample/demo_kegg_reaction.rb +66 -0
data/sample/demo_kegg_taxonomy.rb +92 -0
data/sample/demo_keggapi.rb +502 -0
data/sample/demo_litdb.rb +42 -0
data/sample/demo_locations.rb +99 -0
data/sample/demo_ncbi_rest.rb +130 -0
data/sample/demo_nucleicacid.rb +49 -0
data/sample/demo_pathway.rb +196 -0
data/sample/demo_prosite.rb +120 -0
data/sample/demo_psort.rb +138 -0
data/sample/demo_psort_report.rb +70 -0
data/sample/demo_pubmed.rb +118 -0
data/sample/demo_sirna.rb +63 -0
data/sample/demo_sosui_report.rb +89 -0
data/sample/demo_targetp_report.rb +135 -0
data/sample/demo_tmhmm_report.rb +68 -0
data/sample/pmfetch.rb +13 -4
data/sample/pmsearch.rb +15 -4
data/sample/test_phyloxml_big.rb +205 -0
data/test/bioruby_test_helper.rb +61 -0
data/test/data/KEGG/1.1.1.1.enzyme +935 -0
data/test/data/KEGG/C00025.compound +102 -0
data/test/data/KEGG/D00063.drug +104 -0
data/test/data/KEGG/G00024.glycan +47 -0
data/test/data/KEGG/G01366.glycan +18 -0
data/test/data/KEGG/K02338.orthology +902 -0
data/test/data/KEGG/R00006.reaction +14 -0
data/test/data/fastq/README.txt +109 -0
data/test/data/fastq/error_diff_ids.fastq +20 -0
data/test/data/fastq/error_double_qual.fastq +22 -0
data/test/data/fastq/error_double_seq.fastq +22 -0
data/test/data/fastq/error_long_qual.fastq +20 -0
data/test/data/fastq/error_no_qual.fastq +20 -0
data/test/data/fastq/error_qual_del.fastq +20 -0
data/test/data/fastq/error_qual_escape.fastq +20 -0
data/test/data/fastq/error_qual_null.fastq +0 -0
data/test/data/fastq/error_qual_space.fastq +21 -0
data/test/data/fastq/error_qual_tab.fastq +21 -0
data/test/data/fastq/error_qual_unit_sep.fastq +20 -0
data/test/data/fastq/error_qual_vtab.fastq +20 -0
data/test/data/fastq/error_short_qual.fastq +20 -0
data/test/data/fastq/error_spaces.fastq +20 -0
data/test/data/fastq/error_tabs.fastq +21 -0
data/test/data/fastq/error_trunc_at_plus.fastq +19 -0
data/test/data/fastq/error_trunc_at_qual.fastq +19 -0
data/test/data/fastq/error_trunc_at_seq.fastq +18 -0
data/test/data/fastq/error_trunc_in_plus.fastq +19 -0
data/test/data/fastq/error_trunc_in_qual.fastq +20 -0
data/test/data/fastq/error_trunc_in_seq.fastq +18 -0
data/test/data/fastq/error_trunc_in_title.fastq +17 -0
data/test/data/fastq/illumina_full_range_as_illumina.fastq +8 -0
data/test/data/fastq/illumina_full_range_as_sanger.fastq +8 -0
data/test/data/fastq/illumina_full_range_as_solexa.fastq +8 -0
data/test/data/fastq/illumina_full_range_original_illumina.fastq +8 -0
data/test/data/fastq/longreads_as_illumina.fastq +40 -0
data/test/data/fastq/longreads_as_sanger.fastq +40 -0
data/test/data/fastq/longreads_as_solexa.fastq +40 -0
data/test/data/fastq/longreads_original_sanger.fastq +120 -0
data/test/data/fastq/misc_dna_as_illumina.fastq +16 -0
data/test/data/fastq/misc_dna_as_sanger.fastq +16 -0
data/test/data/fastq/misc_dna_as_solexa.fastq +16 -0
data/test/data/fastq/misc_dna_original_sanger.fastq +16 -0
data/test/data/fastq/misc_rna_as_illumina.fastq +16 -0
data/test/data/fastq/misc_rna_as_sanger.fastq +16 -0
data/test/data/fastq/misc_rna_as_solexa.fastq +16 -0
data/test/data/fastq/misc_rna_original_sanger.fastq +16 -0
data/test/data/fastq/sanger_full_range_as_illumina.fastq +8 -0
data/test/data/fastq/sanger_full_range_as_sanger.fastq +8 -0
data/test/data/fastq/sanger_full_range_as_solexa.fastq +8 -0
data/test/data/fastq/sanger_full_range_original_sanger.fastq +8 -0
data/test/data/fastq/solexa_full_range_as_illumina.fastq +8 -0
data/test/data/fastq/solexa_full_range_as_sanger.fastq +8 -0
data/test/data/fastq/solexa_full_range_as_solexa.fastq +8 -0
data/test/data/fastq/solexa_full_range_original_solexa.fastq +8 -0
data/test/data/fastq/wrapping_as_illumina.fastq +12 -0
data/test/data/fastq/wrapping_as_sanger.fastq +12 -0
data/test/data/fastq/wrapping_as_solexa.fastq +12 -0
data/test/data/fastq/wrapping_original_sanger.fastq +24 -0
data/test/data/meme/db +0 -0
data/test/data/meme/mast +0 -0
data/test/data/meme/mast.out +13 -0
data/test/data/meme/meme.out +3 -0
data/test/data/phyloxml/apaf.xml +666 -0
data/test/data/phyloxml/bcl_2.xml +2097 -0
data/test/data/phyloxml/made_up.xml +144 -0
data/test/data/phyloxml/ncbi_taxonomy_mollusca_short.xml +65 -0
data/test/data/phyloxml/phyloxml_examples.xml +415 -0
data/test/data/sanger_chromatogram/test_chromatogram_abif.ab1 +0 -0
data/test/data/sanger_chromatogram/test_chromatogram_scf_v2.scf +0 -0
data/test/data/sanger_chromatogram/test_chromatogram_scf_v3.scf +0 -0
data/test/functional/bio/appl/test_pts1.rb +7 -5
data/test/functional/bio/io/test_ensembl.rb +4 -3
data/test/functional/bio/io/test_pubmed.rb +9 -3
data/test/functional/bio/io/test_soapwsdl.rb +5 -4
data/test/functional/bio/io/test_togows.rb +5 -4
data/test/functional/bio/sequence/test_output_embl.rb +6 -4
data/test/functional/bio/test_command.rb +54 -5
data/test/runner.rb +5 -3
data/test/unit/bio/appl/bl2seq/test_report.rb +5 -4
data/test/unit/bio/appl/blast/test_ncbioptions.rb +4 -2
data/test/unit/bio/appl/blast/test_report.rb +5 -4
data/test/unit/bio/appl/blast/test_rpsblast.rb +5 -4
data/test/unit/bio/appl/gcg/test_msf.rb +5 -5
data/test/unit/bio/appl/genscan/test_report.rb +8 -9
data/test/unit/bio/appl/hmmer/test_report.rb +5 -4
data/test/unit/bio/appl/iprscan/test_report.rb +6 -5
data/test/unit/bio/appl/mafft/test_report.rb +6 -5
data/test/unit/bio/appl/meme/mast/test_report.rb +46 -0
data/test/unit/bio/appl/meme/test_mast.rb +103 -0
data/test/unit/bio/appl/meme/test_motif.rb +38 -0
data/test/unit/bio/appl/paml/codeml/test_rates.rb +5 -4
data/test/unit/bio/appl/paml/codeml/test_report.rb +5 -4
data/test/unit/bio/appl/paml/test_codeml.rb +5 -4
data/test/unit/bio/appl/sim4/test_report.rb +5 -4
data/test/unit/bio/appl/sosui/test_report.rb +6 -5
data/test/unit/bio/appl/targetp/test_report.rb +5 -3
data/test/unit/bio/appl/test_blast.rb +5 -4
data/test/unit/bio/appl/test_fasta.rb +4 -2
data/test/unit/bio/appl/test_pts1.rb +4 -2
data/test/unit/bio/appl/tmhmm/test_report.rb +6 -5
data/test/unit/bio/data/test_aa.rb +5 -3
data/test/unit/bio/data/test_codontable.rb +5 -4
data/test/unit/bio/data/test_na.rb +5 -3
data/test/unit/bio/db/biosql/tc_biosql.rb +5 -1
data/test/unit/bio/db/embl/test_common.rb +4 -2
data/test/unit/bio/db/embl/test_embl.rb +6 -6
data/test/unit/bio/db/embl/test_embl_rel89.rb +6 -6
data/test/unit/bio/db/embl/test_embl_to_bioseq.rb +7 -8
data/test/unit/bio/db/embl/test_sptr.rb +6 -8
data/test/unit/bio/db/embl/test_uniprot.rb +6 -5
data/test/unit/bio/db/fasta/test_format_qual.rb +346 -0
data/test/unit/bio/db/kegg/test_compound.rb +146 -0
data/test/unit/bio/db/kegg/test_drug.rb +194 -0
data/test/unit/bio/db/kegg/test_enzyme.rb +241 -0
data/test/unit/bio/db/kegg/test_genes.rb +32 -4
data/test/unit/bio/db/kegg/test_glycan.rb +260 -0
data/test/unit/bio/db/kegg/test_orthology.rb +50 -0
data/test/unit/bio/db/kegg/test_reaction.rb +96 -0
data/test/unit/bio/db/pdb/test_pdb.rb +4 -2
data/test/unit/bio/db/sanger_chromatogram/test_abif.rb +76 -0
data/test/unit/bio/db/sanger_chromatogram/test_scf.rb +98 -0
data/test/unit/bio/db/test_aaindex.rb +6 -6
data/test/unit/bio/db/test_fasta.rb +5 -46
data/test/unit/bio/db/test_fastq.rb +829 -0
data/test/unit/bio/db/test_gff.rb +4 -2
data/test/unit/bio/db/test_lasergene.rb +7 -5
data/test/unit/bio/db/test_medline.rb +4 -2
data/test/unit/bio/db/test_newick.rb +6 -6
data/test/unit/bio/db/test_nexus.rb +4 -2
data/test/unit/bio/db/test_phyloxml.rb +769 -0
data/test/unit/bio/db/test_phyloxml_writer.rb +328 -0
data/test/unit/bio/db/test_prosite.rb +6 -5
data/test/unit/bio/db/test_qual.rb +63 -0
data/test/unit/bio/db/test_rebase.rb +5 -3
data/test/unit/bio/db/test_soft.rb +7 -6
data/test/unit/bio/io/flatfile/test_autodetection.rb +6 -7
data/test/unit/bio/io/flatfile/test_buffer.rb +6 -5
data/test/unit/bio/io/flatfile/test_splitter.rb +4 -4
data/test/unit/bio/io/test_ddbjxml.rb +4 -3
data/test/unit/bio/io/test_ensembl.rb +5 -3
data/test/unit/bio/io/test_fastacmd.rb +4 -3
data/test/unit/bio/io/test_flatfile.rb +6 -5
data/test/unit/bio/io/test_soapwsdl.rb +4 -3
data/test/unit/bio/io/test_togows.rb +4 -2
data/test/unit/bio/sequence/test_aa.rb +5 -3
data/test/unit/bio/sequence/test_common.rb +4 -2
data/test/unit/bio/sequence/test_compat.rb +4 -2
data/test/unit/bio/sequence/test_dblink.rb +5 -3
data/test/unit/bio/sequence/test_na.rb +4 -2
data/test/unit/bio/sequence/test_quality_score.rb +330 -0
data/test/unit/bio/shell/plugin/test_seq.rb +5 -3
data/test/unit/bio/test_alignment.rb +5 -3
data/test/unit/bio/test_command.rb +4 -3
data/test/unit/bio/test_db.rb +5 -3
data/test/unit/bio/test_feature.rb +4 -2
data/test/unit/bio/test_location.rb +4 -2
data/test/unit/bio/test_map.rb +5 -3
data/test/unit/bio/test_pathway.rb +4 -2
data/test/unit/bio/test_reference.rb +4 -2
data/test/unit/bio/test_sequence.rb +5 -3
data/test/unit/bio/test_shell.rb +5 -3
data/test/unit/bio/test_tree.rb +6 -6
data/test/unit/bio/util/restriction_enzyme/analysis/test_calculated_cuts.rb +4 -2
data/test/unit/bio/util/restriction_enzyme/analysis/test_cut_ranges.rb +4 -2
data/test/unit/bio/util/restriction_enzyme/analysis/test_sequence_range.rb +4 -2
data/test/unit/bio/util/restriction_enzyme/double_stranded/test_aligned_strands.rb +4 -2
data/test/unit/bio/util/restriction_enzyme/double_stranded/test_cut_location_pair.rb +4 -2
data/test/unit/bio/util/restriction_enzyme/double_stranded/test_cut_location_pair_in_enzyme_notation.rb +4 -2
data/test/unit/bio/util/restriction_enzyme/double_stranded/test_cut_locations.rb +4 -2
data/test/unit/bio/util/restriction_enzyme/double_stranded/test_cut_locations_in_enzyme_notation.rb +4 -2
data/test/unit/bio/util/restriction_enzyme/single_strand/test_cut_locations_in_enzyme_notation.rb +4 -2
data/test/unit/bio/util/restriction_enzyme/test_analysis.rb +4 -2
data/test/unit/bio/util/restriction_enzyme/test_cut_symbol.rb +4 -2
data/test/unit/bio/util/restriction_enzyme/test_double_stranded.rb +4 -2
data/test/unit/bio/util/restriction_enzyme/test_single_strand.rb +17 -13
data/test/unit/bio/util/restriction_enzyme/test_single_strand_complement.rb +17 -13
data/test/unit/bio/util/restriction_enzyme/test_string_formatting.rb +4 -2
data/test/unit/bio/util/test_color_scheme.rb +5 -3
data/test/unit/bio/util/test_contingency_table.rb +5 -3
data/test/unit/bio/util/test_restriction_enzyme.rb +4 -2
data/test/unit/bio/util/test_sirna.rb +6 -4
metadata +147 -2

data/doc/Tutorial.rd CHANGED

@@ -1,12 +1,10 @@
 # This document is generated with a version of rd2html (part of Hiki)
 #
-# A possible test run could be from rdtool (on Debian package rdtool)
-#
-#   rd2 $BIORUBYPATH/doc/Tutorial.rd
+#   rd2 Tutorial.rd
 #
 # or with style sheet:
 #
-#   rd2 -r rd/rd2html-lib.rb --with-css=bioruby.css $BIORUBYPATH/doc/Tutorial.rd > ~/bioruby.html
+#   rd2 -r rd/rd2html-lib.rb --with-css=bioruby.css Tutorial.rd > Tutorial.rd.html
 #
 # in Debian:
 #
@@ -17,9 +15,18 @@
 # To add tests run Toshiaki's bioruby shell and paste in the query plus
 # results.
 #
-# To run the embedded Ruby doctests you can use the rubydoctest tool, part
-# of the bioruby-support repository at http://github.com/pjotrp/bioruby-support/
+# To run the embedded Ruby doctests you can use the rubydoctest tool, though
+# it needs a little conversion. Like:
+#
+#   cat Tutorial.rd | sed -e "s,bioruby>,>>," | sed "s,==>,=>," > Tutorial.rd.tmp
+#   rubydoctest Tutorial.rd.tmp
+#
+# Rubydoctest is useful to verify an example in this document (still) works
+#
 #
+#
+bioruby> $: << '../lib'
 =begin
 #doctest Testing bioruby
@@ -29,7 +36,7 @@
 * Copyright (C) 2001-2003 KATAYAMA Toshiaki <k .at. bioruby.org>
 * Copyright (C) 2005-2009 Pjotr Prins, Naohisa Goto and others
-This document was last modified: 2009/03/17
+This document was last modified: 2009/12/27
 Current editor: Pjotr Prins <p .at. bioruby.org>
 The latest version resides in the GIT source code repository:  ./doc/((<Tutorial.rd|URL:http://github.com/pjotrp/bioruby/raw/documentation/doc/Tutorial.rd>)).
@@ -202,7 +209,6 @@ use all methods on the subsequence. For example,
    bioruby> a
    ==> ["MHAIK", "HAIKL", "AIKLI", "IKLIP", "KLIPI", "LIPIR", "IPIRS", "PIRSS", "IRSSR", "RSSRS", "SSRSS", "SRSSK", "RSSKK", "SSKKK"]
 Finally, the window_search method returns the last leftover
 subsequence. This allows for example
@@ -785,19 +791,19 @@ which supports the "-m 0" default and "-m 7" XML type output format.
 * For example:
-   bioruby> blast_version = nil; result = []
-   bioruby> Bio::Blast.reports(File.new("../test/data/blast/blastp-multi.m7")) do |report|
-   bioruby>   blast_version = report.version
-   bioruby>   report.iterations.each do |itr|
-   bioruby>     itr.hits.each do |hit|
-   bioruby>       result.push hit.target_id
-   bioruby>     end
-   bioruby>   end
-   bioruby> end
-   bioruby> blast_version
-   ==> "blastp 2.2.18 [Mar-02-2008]"
-   bioruby> result
-   ==> ["BAB38768", "BAB38768", "BAB38769", "BAB37741"]
+    blast_version = nil; result = []
+    Bio::Blast.reports(File.new("../test/data/blast/blastp-multi.m7")) do |report|
+      blast_version = report.version
+      report.iterations.each do |itr|
+        itr.hits.each do |hit|
+          result.push hit.target_id
+        end
+      end
+    end
+    blast_version
+    # ==> "blastp 2.2.18 [Mar-02-2008]"
+    result
+    # ==> ["BAB38768", "BAB38768", "BAB38769", "BAB37741"]
 * another example:
@@ -843,6 +849,8 @@ When you write above routines, please send to the BioRuby project and
 they may be included.
 == Generate a reference list using PubMed (Bio::PubMed)
+=end
+(EDITORs NOTE: examples in this section do not work and should be rewritten.)
 Below script is an example which seaches PubMed and creates a reference list.
@@ -891,6 +899,7 @@ bold and italic font output.
 (EDITORs NOTE: do we have some simple object that can be queried for
 author, title etc.?)
+=begin
 Nowadays using NCBI E-Utils is recommended. Use Bio::PubMed.esearch
 and Bio::PubMed.efetch instead of above methods.
@@ -900,6 +909,11 @@ and Bio::PubMed.efetch instead of above methods.
     require 'bio'
+    # NCBI announces that queries without email address will return error
+    # after June 2010. When you modify the script, please enter your email
+    # address instead of the staff's.
+    Bio::NCBI.default_email = 'staff@bioruby.org'
     keywords = ARGV.join(' ')
     options = {
@@ -1199,6 +1213,110 @@ Bio::Fetch.query method.)
 to be written...
+= PhyloXML
+PhyloXML is an XML language for saving, analyzing and exchanging data of
+annotated phylogenetic trees. PhyloXML parser in BioRuby is implemented in
+Bio::PhyloXML::Parser and writer in Bio::PhyloXML::Writer.
+More information at www.phyloxml.org
+== Requirements
+In addition to BioRuby library you need a libxml ruby bindings. To install:
+  % gem install -r libxml-ruby
+For more information see ((<URL:http://libxml.rubyforge.org/install.xml>))
+== Parsing a file
+    require 'bio'
+    # Create new phyloxml parser
+    phyloxml = Bio::PhyloXML::Parser.open('example.xml')
+    # Print the names of all trees in the file
+    phyloxml.each do |tree|
+      puts tree.name
+    end
+If there are several trees in the file, you can access the one you wish by an index
+    tree = phyloxml[3]
+You can use all Bio::Tree methods on the tree, since PhyloXML::Tree inherits from Bio::Tree. For example,
+   tree.leaves.each do |node|
+     puts node.name
+   end
+PhyloXML files can hold additional information besides phylogenies at the end of the file. This info can be accessed through the 'other' array of the parser object.
+    phyloxml = Bio::PhyloXML::Parser.open('example.xml')
+    while tree = phyloxml.next_tree
+      # do stuff with trees
+    end
+    puts phyloxml.other
+== Writing a file
+    # Create new phyloxml writer
+    writer = Bio::PhyloXML::Writer.new('tree.xml')
+    # Write tree to the file tree.xml
+    writer.write(tree1)
+    # Add another tree to the file
+    writer.write(tree2)
+== Retrieving data
+Here is an example of how to retrieve the scientific name of the clades.
+    require 'bio'
+    phyloxml = Bio::PhyloXML::Parser.open('ncbi_taxonomy_mollusca.xml')
+    phyloxml.each do |tree|
+      tree.each_node do |node|
+        print "Scientific name: ", node.taxonomies[0].scientific_name, "\n"
+      end
+    end
+== Retrieving 'other' data
+    require 'bio'
+    phyloxml = Bio::PhyloXML::Parser.open('phyloxml_examples.xml')
+    while tree = phyloxml.next_tree
+     #do something with the trees
+    end
+    p phyloxml.other
+    puts "\n"
+    #=> output is an object representation
+    #Print in a readable way
+    puts phyloxml.other[0].to_xml, "\n"
+    #=>:
+    #
+    #<align:alignment xmlns:align="http://example.org/align">
+    #  <seq name="A">acgtcgcggcccgtggaagtcctctcct</seq>
+    #  <seq name="B">aggtcgcggcctgtggaagtcctctcct</seq>
+    #  <seq name="C">taaatcgc--cccgtgg-agtccc-cct</seq>
+    #</align:alignment>
+    #Once we know whats there, lets output just sequences
+    phyloxml.other[0].children.each do |node|
+     puts node.value
+    end
+    #=>
+    #
+    #acgtcgcggcccgtggaagtcctctcct
+    #aggtcgcggcctgtggaagtcctctcct
+    #taaatcgc--cccgtgg-agtccc-cct
 == The BioRuby example programs
 Some sample programs are stored in ./samples/ directory. Run for example:
@@ -1296,7 +1414,9 @@ At the moment there is no easy way of accessing BioPerl from Ruby. The best way,
 == Installing required external library
-At this point for using BioRuby no additional libraries are needed.
+At this point for using BioRuby no additional libraries are needed, except if
+you are using Bio::PhyloXML module. Then you have to install libxml-ruby.
 This may change, so keep an eye on the Bioruby website. Also when
 a package is missing BioRuby should show an informative message.
@@ -1305,6 +1425,18 @@ painful, as the gem standard for packages evolved late and some still
 force you to copy things by hand. Therefore read the README's
 carefully that come with each package.
+=== Installing libxml-ruby
+The simplest way is to use gem packaging system.
+  gem install -r libxml-ruby
+If you get `require': no such file to load - mkmf (LoadError) error then do
+  sudo apt-get install ruby-dev
+If you have other problems with installation, then see ((<URL:http://libxml.rubyforge.org/install.xml>))
 == Trouble shooting
 * Error: in `require': no such file to load -- bio (LoadError)

data/doc/Tutorial.rd.html CHANGED

@@ -13,7 +13,7 @@
 <li>Copyright (C) 2001-2003 KATAYAMA Toshiaki &lt;k .at. bioruby.org&gt;</li>
 <li>Copyright (C) 2005-2009 Pjotr Prins, Naohisa Goto and others</li>
 </ul>
-<p>This document was last modified: 2009/03/17
+<p>This document was last modified: 2009/12/27
 Current editor: Pjotr Prins &lt;p .at. bioruby.org&gt;</p>
 <p>The latest version resides in the GIT source code repository:  ./doc/<a href="http://github.com/pjotrp/bioruby/raw/documentation/doc/Tutorial.rd">Tutorial.rd</a>.</p>
 <h2><a name="label-1" id="label-1">Introduction</a></h2><!-- RDLabel: "Introduction" -->
@@ -652,19 +652,19 @@ Bio::Blast factory object. For this purpose use Bio::Blast.reports,
 which supports the "-m 0" default and "-m 7" XML type output format.</p>
 <ul>
 <li><p>For example: </p>
-<pre>bioruby&gt; blast_version = nil; result = []
-bioruby&gt; Bio::Blast.reports(File.new("../test/data/blast/blastp-multi.m7")) do |report|
-bioruby&gt;   blast_version = report.version
-bioruby&gt;   report.iterations.each do |itr|
-bioruby&gt;     itr.hits.each do |hit|
-bioruby&gt;       result.push hit.target_id
-bioruby&gt;     end
-bioruby&gt;   end
-bioruby&gt; end
-bioruby&gt; blast_version
-==&gt; "blastp 2.2.18 [Mar-02-2008]"
-bioruby&gt; result
-==&gt; ["BAB38768", "BAB38768", "BAB38769", "BAB37741"]</pre></li>
+<pre>blast_version = nil; result = []
+Bio::Blast.reports(File.new("../test/data/blast/blastp-multi.m7")) do |report|
+  blast_version = report.version
+  report.iterations.each do |itr|
+    itr.hits.each do |hit|
+      result.push hit.target_id
+    end
+  end
+end
+blast_version
+# ==&gt; "blastp 2.2.18 [Mar-02-2008]"
+result
+# ==&gt; ["BAB38768", "BAB38768", "BAB38769", "BAB37741"]</pre></li>
 <li><p>another example:</p>
 <pre>require 'bio'
 Bio::Blast.reports(ARGF) do |report|
@@ -699,49 +699,17 @@ Bio::Blast::Report.new(or Bio::Blast::Default::Report.new):</p>
 <p>When you write above routines, please send to the BioRuby project and
 they may be included.</p>
 <h2><a name="label-14" id="label-14">Generate a reference list using PubMed (Bio::PubMed)</a></h2><!-- RDLabel: "Generate a reference list using PubMed (Bio::PubMed)" -->
-<p>Below script is an example which seaches PubMed and creates a reference list.</p>
-<pre>ARGV.each do |id|
-  entry = Bio::PubMed.query(id)     # searches PubMed and get entry
-  medline = Bio::MEDLINE.new(entry) # creates Bio::MEDLINE object from entry text
-  reference = medline.reference     # converts into Bio::Reference object
-  puts reference.bibtex             # shows BibTeX formatted text
-end</pre>
-<p>We named the script pmfetch.rb.</p>
-<pre>% ./pmfetch.rb 11024183 10592278 10592173</pre>
-<p>To give some PubMed ID (PMID) in arguments, the script retrieves informations
-from NCBI, parses MEDLINE format text, converts into BibTeX format and
-shows them.</p>
-<p>A keyword search is also available.</p>
-<pre>#!/usr/bin/env ruby
-require 'bio'
-# Concatinates argument keyword list to a string
-keywords = ARGV.join(' ')
-# PubMed keyword search
-entries = Bio::PubMed.search(keywords)
-entries.each do |entry|
-  medline = Bio::MEDLINE.new(entry) # creates Bio::MEDLINE object from text
-  reference = medline.reference     # converts into Bio::Reference object
-  puts reference.bibtex             # shows BibTeX format text
-end</pre>
-<p>We named the script pmsearch.rb.</p>
-<pre>% ./pmsearch.rb genome bioinformatics</pre>
-<p>To give keywords in arguments, the script searches PubMed by given
-keywords and shows bibliography informations in a BibTex format. Other
-output formats are also avaialble like the bibitem method described
-below. Some journal formats like nature and nar can be used, but lack
-bold and italic font output.</p>
-<p>(EDITORs NOTE: do we have some simple object that can be queried for
-author, title etc.?)</p>
 <p>Nowadays using NCBI E-Utils is recommended. Use Bio::PubMed.esearch
 and Bio::PubMed.efetch instead of above methods.</p>
 <pre>#!/usr/bin/env ruby
 require 'bio'
+# NCBI announces that queries without email address will return error
+# after June 2010. When you modify the script, please enter your email
+# address instead of the staff's.
+Bio::NCBI.default_email = 'staff@bioruby.org'
 keywords = ARGV.join(' ')
 options = {
@@ -979,22 +947,104 @@ from other BioFetch servers, we used bioruby.org server with
 Bio::Fetch.query method.)</p>
 <h2><a name="label-22" id="label-22">BioSQL</a></h2><!-- RDLabel: "BioSQL" -->
 <p>to be written...</p>
-<h2><a name="label-23" id="label-23">The BioRuby example programs</a></h2><!-- RDLabel: "The BioRuby example programs" -->
+<h1><a name="label-23" id="label-23">PhyloXML</a></h1><!-- RDLabel: "PhyloXML" -->
+<p>PhyloXML is an XML language for saving, analyzing and exchanging data of
+annotated phylogenetic trees. PhyloXML parser in BioRuby is implemented in
+Bio::PhyloXML::Parser and writer in Bio::PhyloXML::Writer.
+More information at www.phyloxml.org</p>
+<h2><a name="label-24" id="label-24">Requirements</a></h2><!-- RDLabel: "Requirements" -->
+<p>In addition to BioRuby library you need a libxml ruby bindings. To install:</p>
+<pre>% gem install -r libxml-ruby</pre>
+<p>For more information see <a href="http://libxml.rubyforge.org/install.xml">&lt;URL:http://libxml.rubyforge.org/install.xml&gt;</a></p>
+<h2><a name="label-25" id="label-25">Parsing a file</a></h2><!-- RDLabel: "Parsing a file" -->
+<pre>require 'bio'
+# Create new phyloxml parser
+phyloxml = Bio::PhyloXML::Parser.open('example.xml')
+# Print the names of all trees in the file
+phyloxml.each do |tree|
+  puts tree.name
+end</pre>
+<p>If there are several trees in the file, you can access the one you wish by an index</p>
+<pre>tree = phyloxml[3]</pre>
+<p>You can use all Bio::Tree methods on the tree, since PhyloXML::Tree inherits from Bio::Tree. For example,</p>
+<pre>tree.leaves.each do |node|
+  puts node.name
+end</pre>
+<p>PhyloXML files can hold additional information besides phylogenies at the end of the file. This info can be accessed through the 'other' array of the parser object.</p>
+<pre>phyloxml = Bio::PhyloXML::Parser.open('example.xml')
+while tree = phyloxml.next_tree
+  # do stuff with trees
+end
+puts phyloxml.other</pre>
+<h2><a name="label-26" id="label-26">Writing a file</a></h2><!-- RDLabel: "Writing a file" -->
+<pre># Create new phyloxml writer
+writer = Bio::PhyloXML::Writer.new('tree.xml')
+# Write tree to the file tree.xml
+writer.write(tree1)
+# Add another tree to the file
+writer.write(tree2)</pre>
+<h2><a name="label-27" id="label-27">Retrieving data</a></h2><!-- RDLabel: "Retrieving data" -->
+<p>Here is an example of how to retrieve the scientific name of the clades.</p>
+<pre>require 'bio'
+phyloxml = Bio::PhyloXML::Parser.open('ncbi_taxonomy_mollusca.xml')
+phyloxml.each do |tree|
+  tree.each_node do |node|
+    print "Scientific name: ", node.taxonomies[0].scientific_name, "\n"
+  end
+end</pre>
+<h2><a name="label-28" id="label-28">Retrieving 'other' data</a></h2><!-- RDLabel: "Retrieving 'other' data" -->
+<pre>require 'bio'
+phyloxml = Bio::PhyloXML::Parser.open('phyloxml_examples.xml')
+while tree = phyloxml.next_tree
+ #do something with the trees
+end
+p phyloxml.other
+puts "\n"
+#=&gt; output is an object representation
+#Print in a readable way
+puts phyloxml.other[0].to_xml, "\n"
+#=&gt;:
+#
+#&lt;align:alignment xmlns:align="http://example.org/align"&gt;
+#  &lt;seq name="A"&gt;acgtcgcggcccgtggaagtcctctcct&lt;/seq&gt;
+#  &lt;seq name="B"&gt;aggtcgcggcctgtggaagtcctctcct&lt;/seq&gt;
+#  &lt;seq name="C"&gt;taaatcgc--cccgtgg-agtccc-cct&lt;/seq&gt;
+#&lt;/align:alignment&gt;
+#Once we know whats there, lets output just sequences
+phyloxml.other[0].children.each do |node|
+ puts node.value
+end
+#=&gt;
+#
+#acgtcgcggcccgtggaagtcctctcct
+#aggtcgcggcctgtggaagtcctctcct
+#taaatcgc--cccgtgg-agtccc-cct</pre>
+<h2><a name="label-29" id="label-29">The BioRuby example programs</a></h2><!-- RDLabel: "The BioRuby example programs" -->
 <p>Some sample programs are stored in ./samples/ directory. Run for example:</p>
 <pre>./sample/na2aa.rb test/data/fasta/example1.txt </pre>
-<h2><a name="label-24" id="label-24">Unit testing and doctests</a></h2><!-- RDLabel: "Unit testing and doctests" -->
+<h2><a name="label-30" id="label-30">Unit testing and doctests</a></h2><!-- RDLabel: "Unit testing and doctests" -->
 <p>BioRuby comes with an extensive testing framework with over 1300 tests and 2700
 assertions. To run the unit tests:</p>
 <pre>cd test
 ruby runner.rb</pre>
 <p>We have also started with doctest for Ruby. We are porting the examples
 in this tutorial to doctest - more info upcoming.</p>
-<h2><a name="label-25" id="label-25">Further reading</a></h2><!-- RDLabel: "Further reading" -->
+<h2><a name="label-31" id="label-31">Further reading</a></h2><!-- RDLabel: "Further reading" -->
 <p>See the BioRuby in anger Wiki.  A lot of BioRuby's documentation exists in the
 source code and unit tests. To really dive in you will need the latest source
 code tree. The embedded rdoc documentation can be viewed online at
 <a href="http://bioruby.org/rdoc/">&lt;URL:http://bioruby.org/rdoc/&gt;</a>.</p>
-<h2><a name="label-26" id="label-26">BioRuby Shell</a></h2><!-- RDLabel: "BioRuby Shell" -->
+<h2><a name="label-32" id="label-32">BioRuby Shell</a></h2><!-- RDLabel: "BioRuby Shell" -->
 <p>The BioRuby shell implementation you find in ./lib/bio/shell. It is very interesting
 as it uses IRB (the Ruby intepreter) which is a powerful environment described in
 <a href="http://ruby-doc.org/docs/ProgrammingRuby/html/irb.html">Programming Ruby's irb chapter</a>. IRB commands can directly be typed in the shell, e.g.</p>
@@ -1003,24 +1053,24 @@ as it uses IRB (the Ruby intepreter) which is a powerful environment described i
 <p>optionally you also may want to install the optional Ruby readline support -
 with Debian libreadline-ruby. To edit a previous line you may have to press
 line down (arrow down) first.</p>
-<h1><a name="label-27" id="label-27">Helpful tools</a></h1><!-- RDLabel: "Helpful tools" -->
+<h1><a name="label-33" id="label-33">Helpful tools</a></h1><!-- RDLabel: "Helpful tools" -->
 <p>Apart from rdoc you may also want to use rtags - which allows jumping around
 source code by clicking on class and method names. </p>
 <pre>cd bioruby/lib
 rtags -R --vi</pre>
 <p>For a tutorial see <a href="http://rtags.rubyforge.org/">&lt;URL:http://rtags.rubyforge.org/&gt;</a></p>
-<h1><a name="label-28" id="label-28">APPENDIX</a></h1><!-- RDLabel: "APPENDIX" -->
-<h2><a name="label-29" id="label-29">KEGG API</a></h2><!-- RDLabel: "KEGG API" -->
+<h1><a name="label-34" id="label-34">APPENDIX</a></h1><!-- RDLabel: "APPENDIX" -->
+<h2><a name="label-35" id="label-35">KEGG API</a></h2><!-- RDLabel: "KEGG API" -->
 <p>Please refer to KEGG_API.rd.ja (English version: <a href="http://www.genome.jp/kegg/soap/doc/keggapi_manual.html">&lt;URL:http://www.genome.jp/kegg/soap/doc/keggapi_manual.html&gt;</a> ) and</p>
 <ul>
 <li><a href="http://www.genome.jp/kegg/soap/">&lt;URL:http://www.genome.jp/kegg/soap/&gt;</a></li>
 </ul>
-<h2><a name="label-30" id="label-30">Ruby Ensembl API</a></h2><!-- RDLabel: "Ruby Ensembl API" -->
+<h2><a name="label-36" id="label-36">Ruby Ensembl API</a></h2><!-- RDLabel: "Ruby Ensembl API" -->
 <p>Ruby Ensembl API is a ruby API to the Ensembl database. It is NOT currently
 included in the BioRuby archives. To install it, see
 <a href="http://wiki.github.com/jandot/ruby-ensembl-api">&lt;URL:http://wiki.github.com/jandot/ruby-ensembl-api&gt;</a>
 for more information.</p>
-<h3><a name="label-31" id="label-31">Gene Ontology (GO) through the Ruby Ensembl API</a></h3><!-- RDLabel: "Gene Ontology (GO) through the Ruby Ensembl API" -->
+<h3><a name="label-37" id="label-37">Gene Ontology (GO) through the Ruby Ensembl API</a></h3><!-- RDLabel: "Gene Ontology (GO) through the Ruby Ensembl API" -->
 <p>Gene Ontologies can be fetched through the Ruby Ensembl API package:</p>
 <pre>require 'ensembl'
 Ensembl::Core::DBConnection.connect('drosophila_melanogaster')
@@ -1037,28 +1087,35 @@ infile.each do |line|
 end</pre>
 <p>Prints each mosq. accession/uniq identifier and the GO terms from the Drosphila
 homologues.</p>
-<h2><a name="label-32" id="label-32">Comparing BioProjects</a></h2><!-- RDLabel: "Comparing BioProjects" -->
+<h2><a name="label-38" id="label-38">Comparing BioProjects</a></h2><!-- RDLabel: "Comparing BioProjects" -->
 <p>For a quick functional comparison of BioRuby, BioPerl, BioPython and Bioconductor (R) see <a href="http://sciruby.codeforpeople.com/sr.cgi/BioProjects">&lt;URL:http://sciruby.codeforpeople.com/sr.cgi/BioProjects&gt;</a></p>
-<h2><a name="label-33" id="label-33">Using BioRuby with R</a></h2><!-- RDLabel: "Using BioRuby with R" -->
+<h2><a name="label-39" id="label-39">Using BioRuby with R</a></h2><!-- RDLabel: "Using BioRuby with R" -->
 <p>Using Ruby with R Pjotr wrote a section on SciRuby. See <a href="http://sciruby.codeforpeople.com/sr.cgi/RubyWithRlang">&lt;URL:http://sciruby.codeforpeople.com/sr.cgi/RubyWithRlang&gt;</a></p>
-<h2><a name="label-34" id="label-34">Using BioPerl or BioPython from Ruby</a></h2><!-- RDLabel: "Using BioPerl or BioPython from Ruby" -->
+<h2><a name="label-40" id="label-40">Using BioPerl or BioPython from Ruby</a></h2><!-- RDLabel: "Using BioPerl or BioPython from Ruby" -->
 <p>At the moment there is no easy way of accessing BioPerl from Ruby. The best way, perhaps, is to create a Perl server that gets accessed through XML/RPC or SOAP.</p>
-<h2><a name="label-35" id="label-35">Installing required external library</a></h2><!-- RDLabel: "Installing required external library" -->
-<p>At this point for using BioRuby no additional libraries are needed.
-This may change, so keep an eye on the Bioruby website. Also when
+<h2><a name="label-41" id="label-41">Installing required external library</a></h2><!-- RDLabel: "Installing required external library" -->
+<p>At this point for using BioRuby no additional libraries are needed, except if
+you are using Bio::PhyloXML module. Then you have to install libxml-ruby.</p>
+<p>This may change, so keep an eye on the Bioruby website. Also when
 a package is missing BioRuby should show an informative message.</p>
 <p>At this point installing third party Ruby packages can be a bit
 painful, as the gem standard for packages evolved late and some still
 force you to copy things by hand. Therefore read the README's
 carefully that come with each package.</p>
-<h2><a name="label-36" id="label-36">Trouble shooting</a></h2><!-- RDLabel: "Trouble shooting" -->
+<h3><a name="label-42" id="label-42">Installing libxml-ruby</a></h3><!-- RDLabel: "Installing libxml-ruby" -->
+<p>The simplest way is to use gem packaging system.</p>
+<pre>gem install -r libxml-ruby</pre>
+<p>If you get `require': no such file to load - mkmf (LoadError) error then do</p>
+<pre>sudo apt-get install ruby-dev</pre>
+<p>If you have other problems with installation, then see <a href="http://libxml.rubyforge.org/install.xml">&lt;URL:http://libxml.rubyforge.org/install.xml&gt;</a>  </p>
+<h2><a name="label-43" id="label-43">Trouble shooting</a></h2><!-- RDLabel: "Trouble shooting" -->
 <ul>
 <li>Error: in `require': no such file to load -- bio (LoadError)</li>
 </ul>
 <p>Ruby fails to find the BioRuby libraries - add it to the RUBYLIB path, or pass
 it to the interpeter. For example:</p>
 <pre>ruby -I$BIORUBYPATH/lib yourprogram.rb</pre>
-<h2><a name="label-37" id="label-37">Modifying this page</a></h2><!-- RDLabel: "Modifying this page" -->
+<h2><a name="label-44" id="label-44">Modifying this page</a></h2><!-- RDLabel: "Modifying this page" -->
 <p>IMPORTANT NOTICE: This page is maintained in the BioRuby source code
 repository. Please edit the file there otherwise changes may get
 lost. See <!-- Reference, RDLabel "BioRuby Developer Information" doesn't exist --><em class="label-not-found">BioRuby Developer Information</em><!-- Reference end --> for repository and mailing list