bioruby-phyloxml 1.0.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- checksums.yaml +7 -0
- data/.gitignore +9 -0
- data/BSDL +22 -0
- data/COPYING +57 -0
- data/COPYING.ja +51 -0
- data/GPL +340 -0
- data/Gemfile +4 -0
- data/LEGAL +36 -0
- data/LGPL +504 -0
- data/README.md +214 -0
- data/Rakefile +20 -0
- data/bioruby-phyloxml.gemspec +36 -0
- data/doc/Tutorial.rd +152 -0
- data/lib/bio-phyloxml.rb +27 -0
- data/lib/bio-phyloxml/compat/cleanup.rb +13 -0
- data/lib/bio-phyloxml/compat/stub_phyloxml_elements.rb +1 -0
- data/lib/bio-phyloxml/compat/stub_phyloxml_parser.rb +1 -0
- data/lib/bio-phyloxml/compat/stub_phyloxml_writer.rb +1 -0
- data/lib/bio-phyloxml/phyloxml.xsd +582 -0
- data/lib/bio-phyloxml/phyloxml_elements.rb +1186 -0
- data/lib/bio-phyloxml/phyloxml_parser.rb +1001 -0
- data/lib/bio-phyloxml/phyloxml_writer.rb +227 -0
- data/lib/bio-phyloxml/version.rb +7 -0
- data/lib/bio/db/phyloxml/phyloxml_elements.rb +4 -0
- data/lib/bio/db/phyloxml/phyloxml_parser.rb +4 -0
- data/lib/bio/db/phyloxml/phyloxml_writer.rb +4 -0
- data/lib/bioruby-phyloxml.rb +10 -0
- data/sample/test_phyloxml_big.rb +205 -0
- metadata +156 -0
data/README.md
ADDED
@@ -0,0 +1,214 @@
|
|
1
|
+
# bio-phyloxml
|
2
|
+
|
3
|
+
[](http://travis-ci.org/bioruby/bioruby-phyloxml)
|
4
|
+
|
5
|
+
bio-phyloxml (the package name on RubyGems.org is bioruby-phyloxml)
|
6
|
+
is a [phyloXML](http://www.phyloxml.org/) plugin for
|
7
|
+
[BioRuby](http://bioruby.org/), an open source bioinformatics
|
8
|
+
library for Ruby.
|
9
|
+
|
10
|
+
phyloXML is an XML language for saving, analyzing and exchanging data
|
11
|
+
of annotated phylogenetic trees. The phyloXML parser in BioRuby is
|
12
|
+
implemented in Bio::PhyloXML::Parser, and its writer in
|
13
|
+
Bio::PhyloXML::Writer. More information can be found at
|
14
|
+
[phyloxml.org](http://www.phyloxml.org).
|
15
|
+
|
16
|
+
This phyloXML code has historically been part of the core BioRuby
|
17
|
+
[gem](https://github.com/bioruby/bioruby), but has been split into its
|
18
|
+
own gem as part of an effort to
|
19
|
+
[modularize](http://bioruby.open-bio.org/wiki/Plugins)
|
20
|
+
BioRuby. bio-phyloxml and many more plugins are available at
|
21
|
+
[biogems.info](http://www.biogems.info/).
|
22
|
+
|
23
|
+
This code was originally written by Diana Jaunzeikare during the
|
24
|
+
Google Summer of Code 2009 for the
|
25
|
+
[Implementing phyloXML support in BioRuby](http://informatics.nescent.org/wiki/Phyloinformatics_Summer_of_Code_2009#Implementing_phyloXML_support_in_BioRuby)
|
26
|
+
project with NESCent, mentored by Christian Zmasek et al. For details
|
27
|
+
of development, see
|
28
|
+
[github.com/latvianlinuxgirl/bioruby](https://github.com/latvianlinuxgirl/bioruby)
|
29
|
+
and the BioRuby mailing list archives.
|
30
|
+
|
31
|
+
## Requirements
|
32
|
+
|
33
|
+
bio-phyloxml uses [libxml-ruby](http://xml4r.github.com/libxml-ruby/),
|
34
|
+
which requires several C libraries and their headers to be installed:
|
35
|
+
* `zlib`
|
36
|
+
* `libiconv`
|
37
|
+
* `libxml`
|
38
|
+
|
39
|
+
With these installed, `libxml-ruby` gem should be installed.
|
40
|
+
|
41
|
+
```sh
|
42
|
+
gem install libxml-ruby
|
43
|
+
```
|
44
|
+
|
45
|
+
If you see "ERROR: Failed to build gem native extension", the above
|
46
|
+
C libraries and their headers may be missing. See doc/Tutorial.rd
|
47
|
+
about installation of them in some system.
|
48
|
+
|
49
|
+
bio-phyloxml also uses the `bio` gem. It will automatically be installed
|
50
|
+
during the installation of `bio-phyloxml` in normal cases.
|
51
|
+
|
52
|
+
For more information see the
|
53
|
+
[libxml page](https://rubygems.org/gems/libxml-ruby) and
|
54
|
+
the [BioRuby installation page](http://bioruby.open-bio.org/wiki/Installation).
|
55
|
+
|
56
|
+
|
57
|
+
## Installation
|
58
|
+
|
59
|
+
```sh
|
60
|
+
gem install bioruby-phyloxml
|
61
|
+
```
|
62
|
+
|
63
|
+
Note: Please uninstall old bio-phyloxml gem that have not been maintained
|
64
|
+
since 2012. The old bio-phyloxml gem was created in 2012 as a preliminary
|
65
|
+
trial of splitting bioruby components to separate gems.
|
66
|
+
We tried to contact the author of the old bio-phyloxml gem, but no response.
|
67
|
+
|
68
|
+
```sh
|
69
|
+
gem uninstall bio-phyloxml
|
70
|
+
```
|
71
|
+
|
72
|
+
## Migration
|
73
|
+
|
74
|
+
Users who were previously using the phyloXML support in the core
|
75
|
+
BioRuby gem should be able to migrate to using this gem very
|
76
|
+
easily. Simply install the `bio-phyloxml` gem as described below, and
|
77
|
+
add `require 'bio-phyloxml'` to the relevant application code.
|
78
|
+
|
79
|
+
## Usage
|
80
|
+
|
81
|
+
```ruby
|
82
|
+
require 'bio-phyloxml'
|
83
|
+
```
|
84
|
+
|
85
|
+
### Parsing a file
|
86
|
+
|
87
|
+
```ruby
|
88
|
+
require 'bio-phyloxml'
|
89
|
+
|
90
|
+
# Create new phyloxml parser
|
91
|
+
phyloxml = Bio::PhyloXML::Parser.open('example.xml')
|
92
|
+
|
93
|
+
# Print the names of all trees in the file
|
94
|
+
phyloxml.each do |tree|
|
95
|
+
puts tree.name
|
96
|
+
end
|
97
|
+
```
|
98
|
+
|
99
|
+
If there are several trees in the file, you can access the one you wish by specifying its index:
|
100
|
+
|
101
|
+
```ruby
|
102
|
+
tree = phyloxml[3]
|
103
|
+
```
|
104
|
+
You can use all Bio::Tree methods on the tree, since PhyloXML::Tree inherits from Bio::Tree. For example,
|
105
|
+
|
106
|
+
```ruby
|
107
|
+
tree.leaves.each do |node|
|
108
|
+
puts node.name
|
109
|
+
end
|
110
|
+
```
|
111
|
+
|
112
|
+
PhyloXML files can hold additional information besides phylogenies at the end of the file. This info can be accessed through the 'other' array of the parser object.
|
113
|
+
|
114
|
+
```ruby
|
115
|
+
phyloxml = Bio::PhyloXML::Parser.open('example.xml')
|
116
|
+
while tree = phyloxml.next_tree
|
117
|
+
# do stuff with trees
|
118
|
+
end
|
119
|
+
|
120
|
+
puts phyloxml.other
|
121
|
+
```
|
122
|
+
|
123
|
+
### Writing a file
|
124
|
+
|
125
|
+
```ruby
|
126
|
+
# Create new phyloxml writer
|
127
|
+
writer = Bio::PhyloXML::Writer.new('tree.xml')
|
128
|
+
|
129
|
+
# Write tree to the file tree.xml
|
130
|
+
writer.write(tree1)
|
131
|
+
|
132
|
+
# Add another tree to the file
|
133
|
+
writer.write(tree2)
|
134
|
+
```
|
135
|
+
|
136
|
+
### Retrieving data
|
137
|
+
|
138
|
+
Here is an example of how to retrieve the scientific name of the clades included in each tree.
|
139
|
+
|
140
|
+
```ruby
|
141
|
+
require 'bio-phyloxml'
|
142
|
+
|
143
|
+
phyloxml = Bio::PhyloXML::Parser.open('ncbi_taxonomy_mollusca.xml')
|
144
|
+
phyloxml.each do |tree|
|
145
|
+
tree.each_node do |node|
|
146
|
+
print "Scientific name: ", node.taxonomies[0].scientific_name, "\n"
|
147
|
+
end
|
148
|
+
end
|
149
|
+
```
|
150
|
+
|
151
|
+
### Retrieving 'other' data
|
152
|
+
|
153
|
+
```ruby
|
154
|
+
require 'bio'
|
155
|
+
|
156
|
+
phyloxml = Bio::PhyloXML::Parser.open('phyloxml_examples.xml')
|
157
|
+
while tree = phyloxml.next_tree
|
158
|
+
#do something with the trees
|
159
|
+
end
|
160
|
+
|
161
|
+
p phyloxml.other
|
162
|
+
puts "\n"
|
163
|
+
#=> output is an object representation
|
164
|
+
|
165
|
+
#Print in a readable way
|
166
|
+
puts phyloxml.other[0].to_xml, "\n"
|
167
|
+
#=>:
|
168
|
+
#
|
169
|
+
#<align:alignment xmlns:align="http://example.org/align">
|
170
|
+
# <seq name="A">acgtcgcggcccgtggaagtcctctcct</seq>
|
171
|
+
# <seq name="B">aggtcgcggcctgtggaagtcctctcct</seq>
|
172
|
+
# <seq name="C">taaatcgc--cccgtgg-agtccc-cct</seq>
|
173
|
+
#</align:alignment>
|
174
|
+
|
175
|
+
#Once we know whats there, lets output just sequences
|
176
|
+
phyloxml.other[0].children.each do |node|
|
177
|
+
puts node.value
|
178
|
+
end
|
179
|
+
#=>
|
180
|
+
#
|
181
|
+
#acgtcgcggcccgtggaagtcctctcct
|
182
|
+
#aggtcgcggcctgtggaagtcctctcct
|
183
|
+
#taaatcgc--cccgtgg-agtccc-cct
|
184
|
+
```
|
185
|
+
|
186
|
+
The API doc is online. (TODO: generate and link) For more code
|
187
|
+
examples see the test files in the source tree.
|
188
|
+
|
189
|
+
## Project home page
|
190
|
+
|
191
|
+
Information on the source tree, documentation, examples, issues and
|
192
|
+
how to contribute, see
|
193
|
+
|
194
|
+
http://github.com/bioruby/bioruby-phyloxml
|
195
|
+
|
196
|
+
The BioRuby community is on IRC server: irc.freenode.org, channel: #bioruby.
|
197
|
+
|
198
|
+
## Cite
|
199
|
+
|
200
|
+
If you use this software, please cite one of
|
201
|
+
|
202
|
+
* [BioRuby: bioinformatics software for the Ruby programming language](http://dx.doi.org/10.1093/bioinformatics/btq475)
|
203
|
+
* [Biogem: an effective tool-based approach for scaling up open source software development in bioinformatics](http://dx.doi.org/10.1093/bioinformatics/bts080)
|
204
|
+
|
205
|
+
## Biogems.info
|
206
|
+
|
207
|
+
This Biogem is published at [#bio-phyloxml](http://biogems.info/index.html)
|
208
|
+
|
209
|
+
## Copyright
|
210
|
+
|
211
|
+
Copyright (c) 2009 Diana Jaunzeikare and BioRuby project.
|
212
|
+
See COPYING or COPYING.ja for further details.
|
213
|
+
|
214
|
+
This README.md was first written by Clayton Wheeler.
|
data/Rakefile
ADDED
@@ -0,0 +1,20 @@
|
|
1
|
+
require "bundler/gem_tasks"
|
2
|
+
require 'rdoc/task'
|
3
|
+
require 'rake/testtask'
|
4
|
+
|
5
|
+
task :default => "test"
|
6
|
+
|
7
|
+
Rake::TestTask.new do |t|
|
8
|
+
t.test_files = FileList["test/unit/**/test_*.rb"]
|
9
|
+
end
|
10
|
+
|
11
|
+
Rake::RDocTask.new do |r|
|
12
|
+
r.rdoc_dir = "rdoc"
|
13
|
+
r.rdoc_files.include("README.md",
|
14
|
+
"COPYING", "COPYING.ja", "BSDL", "LGPL", "GPL",
|
15
|
+
"doc/Tutorial.rd",
|
16
|
+
"lib/**/*.rb")
|
17
|
+
r.main = "README.md"
|
18
|
+
r.options << '--title' << 'Bio::PhyloXML API documentation'
|
19
|
+
r.options << '--line-numbers'
|
20
|
+
end
|
@@ -0,0 +1,36 @@
|
|
1
|
+
# coding: utf-8
|
2
|
+
lib = File.expand_path('../lib', __FILE__)
|
3
|
+
$LOAD_PATH.unshift(lib) unless $LOAD_PATH.include?(lib)
|
4
|
+
require 'bio-phyloxml/version'
|
5
|
+
|
6
|
+
Gem::Specification.new do |spec|
|
7
|
+
spec.name = "bioruby-phyloxml"
|
8
|
+
spec.version = Bio::PhyloXML::VERSION
|
9
|
+
spec.authors = [ "Diana Jaunzeikare", "Clayton Wheeler",
|
10
|
+
"BioRuby project" ]
|
11
|
+
spec.email = [ "staff@bioruby.org" ]
|
12
|
+
|
13
|
+
spec.summary = %q{PhyloXML plugin for BioRuby}
|
14
|
+
spec.description = %q{Provides PhyloXML support for BioRuby. This bioruby-phyloxml gem replaces old unmaintained bio-phyloxml gem.}
|
15
|
+
spec.homepage = "http://github.com/bioruby/bioruby-phyloxml"
|
16
|
+
spec.license = "Ruby"
|
17
|
+
|
18
|
+
spec.files = `git ls-files -z`.split("\x0").reject { |f| f.match(%r{^(test|spec|features)/}) }
|
19
|
+
spec.bindir = "exe"
|
20
|
+
spec.executables = spec.files.grep(%r{^exe/}) { |f| File.basename(f) }
|
21
|
+
spec.require_paths = ["lib"]
|
22
|
+
|
23
|
+
spec.extra_rdoc_files = [ "README.md",
|
24
|
+
"COPYING", "COPYING.ja", "BSDL", "LGPL", "GPL",
|
25
|
+
"doc/Tutorial.rd" ]
|
26
|
+
spec.rdoc_options << '--main' << 'README.md'
|
27
|
+
spec.rdoc_options << '--title' << 'Bio::PhyloXML API documentation'
|
28
|
+
spec.rdoc_options << '--line-numbers'
|
29
|
+
|
30
|
+
spec.add_runtime_dependency "bio", ">= 1.5.0"
|
31
|
+
spec.add_runtime_dependency "libxml-ruby", "~> 2.8"
|
32
|
+
|
33
|
+
spec.add_development_dependency "bundler", "~> 1.10"
|
34
|
+
spec.add_development_dependency "rake", "~> 10.0"
|
35
|
+
spec.add_development_dependency "rdoc", "~> 4"
|
36
|
+
end
|
data/doc/Tutorial.rd
ADDED
@@ -0,0 +1,152 @@
|
|
1
|
+
# This document is generated with a version of rd2html (part of Hiki)
|
2
|
+
#
|
3
|
+
# A possible test run could be from rdtool (on Debian package rdtool)
|
4
|
+
#
|
5
|
+
# rd2 $BIORUBYPATH/doc/Tutorial.rd
|
6
|
+
#
|
7
|
+
# or with style sheet:
|
8
|
+
#
|
9
|
+
# rd2 -r rd/rd2html-lib.rb --with-css=bioruby.css $BIORUBYPATH/doc/Tutorial.rd > ~/bioruby.html
|
10
|
+
#
|
11
|
+
# in Debian:
|
12
|
+
#
|
13
|
+
# rd2 -r rd/rd2html-lib --with-css="../lib/bio/shell/rails/vendor/plugins/bioruby/generators/bioruby/templates/bioruby.css" Tutorial.rd > Tutorial.rd.html
|
14
|
+
#
|
15
|
+
# A common problem is tabs in the text file! TABs are not allowed.
|
16
|
+
#
|
17
|
+
# To add tests run Toshiaki's bioruby shell and paste in the query plus
|
18
|
+
# results.
|
19
|
+
#
|
20
|
+
# To run the embedded Ruby doctests you can use the rubydoctest tool, part
|
21
|
+
# of the bioruby-support repository at http://github.com/pjotrp/bioruby-support/
|
22
|
+
#
|
23
|
+
|
24
|
+
=begin
|
25
|
+
#doctest Testing bioruby
|
26
|
+
|
27
|
+
= Bio::PhyloXML Tutorial
|
28
|
+
|
29
|
+
* Copyright (C) 2001-2003 KATAYAMA Toshiaki <k .at. bioruby.org>
|
30
|
+
* Copyright (C) 2005-2009 Pjotr Prins, Naohisa Goto and others
|
31
|
+
|
32
|
+
= PhyloXML
|
33
|
+
|
34
|
+
PhyloXML is an XML language for saving, analyzing and exchanging data of
|
35
|
+
annotated phylogenetic trees. PhyloXML parser in BioRuby is implemented in
|
36
|
+
Bio::PhyloXML::Parser and writer in Bio::PhyloXML::Writer.
|
37
|
+
More information at www.phyloxml.org
|
38
|
+
|
39
|
+
== Install
|
40
|
+
|
41
|
+
% gem install bio-phyloxml
|
42
|
+
|
43
|
+
In addition to bio-phyloxml, dependent gems such as bio and libxml-ruby
|
44
|
+
will automatically be installed.
|
45
|
+
|
46
|
+
== Parsing a file
|
47
|
+
|
48
|
+
require 'bio-phyloxml'
|
49
|
+
|
50
|
+
# Create new phyloxml parser
|
51
|
+
phyloxml = Bio::PhyloXML::Parser.new('example.xml')
|
52
|
+
|
53
|
+
# Print the names of all trees in the file
|
54
|
+
phyloxml.each do |tree|
|
55
|
+
puts tree.name
|
56
|
+
end
|
57
|
+
|
58
|
+
If there are several trees in the file, you can access the one you wish by an index
|
59
|
+
|
60
|
+
tree = phyloxml[3]
|
61
|
+
|
62
|
+
You can use all Bio::Tree methods on the tree, since PhyloXML::Tree inherits from Bio::Tree. For example,
|
63
|
+
|
64
|
+
tree.leaves.each do |node|
|
65
|
+
puts node.name
|
66
|
+
end
|
67
|
+
|
68
|
+
PhyloXML files can hold additional information besides phylogenies at the end of the file. This info can be accessed through the 'other' array of the parser object.
|
69
|
+
|
70
|
+
phyloxml = Bio::PhyloXML::Parser.new('example.xml')
|
71
|
+
while tree = phyloxml.next_tree
|
72
|
+
# do stuff with trees
|
73
|
+
end
|
74
|
+
|
75
|
+
puts phyloxml.other
|
76
|
+
|
77
|
+
== Writing a file
|
78
|
+
|
79
|
+
# Create new phyloxml writer
|
80
|
+
writer = Bio::PhyloXML::Writer.new('tree.xml')
|
81
|
+
|
82
|
+
# Write tree to the file tree.xml
|
83
|
+
writer.write(tree1)
|
84
|
+
|
85
|
+
# Add another tree to the file
|
86
|
+
writer.write(tree2)
|
87
|
+
|
88
|
+
== Retrieving data
|
89
|
+
|
90
|
+
Here is an example of how to retrieve the scientific name of the clades.
|
91
|
+
|
92
|
+
require 'bio-phyloxml'
|
93
|
+
|
94
|
+
phyloxml = Bio::PhyloXML::Parser.new('ncbi_taxonomy_mollusca.xml')
|
95
|
+
phyloxml.each do |tree|
|
96
|
+
tree.each_node do |node|
|
97
|
+
print "Scientific name: ", node.taxonomies[0].scientific_name, "\n"
|
98
|
+
end
|
99
|
+
end
|
100
|
+
|
101
|
+
== Retrieving 'other' data
|
102
|
+
|
103
|
+
require 'bio-phyloxml'
|
104
|
+
|
105
|
+
phyloxml = Bio::PhyloXML::Parser.new('phyloxml_examples.xml')
|
106
|
+
while tree = phyloxml.next_tree
|
107
|
+
#do something with the trees
|
108
|
+
end
|
109
|
+
|
110
|
+
p phyloxml.other
|
111
|
+
puts "\n"
|
112
|
+
#=> output is an object representation
|
113
|
+
|
114
|
+
#Print in a readable way
|
115
|
+
puts phyloxml.other[0].to_xml, "\n"
|
116
|
+
#=>:
|
117
|
+
#
|
118
|
+
#<align:alignment xmlns:align="http://example.org/align">
|
119
|
+
# <seq name="A">acgtcgcggcccgtggaagtcctctcct</seq>
|
120
|
+
# <seq name="B">aggtcgcggcctgtggaagtcctctcct</seq>
|
121
|
+
# <seq name="C">taaatcgc--cccgtgg-agtccc-cct</seq>
|
122
|
+
#</align:alignment>
|
123
|
+
|
124
|
+
#Once we know whats there, lets output just sequences
|
125
|
+
phyloxml.other[0].children.each do |node|
|
126
|
+
puts node.value
|
127
|
+
end
|
128
|
+
#=>
|
129
|
+
#
|
130
|
+
#acgtcgcggcccgtggaagtcctctcct
|
131
|
+
#aggtcgcggcctgtggaagtcctctcct
|
132
|
+
#taaatcgc--cccgtgg-agtccc-cct
|
133
|
+
|
134
|
+
|
135
|
+
= APPENDIX
|
136
|
+
|
137
|
+
=== Troubleshooting libxml-ruby installation problem
|
138
|
+
|
139
|
+
If you get "Failed to build gem native extension" error, you may need to
|
140
|
+
install the GNOME Libxml2 XML toolkit library and development files.
|
141
|
+
|
142
|
+
On Debian or Ubuntu,
|
143
|
+
|
144
|
+
sudo aptitude install libxml2-dev
|
145
|
+
|
146
|
+
On RedHat or CentOS,
|
147
|
+
|
148
|
+
sudo yum install libxml2-devel
|
149
|
+
|
150
|
+
On other platforms, see ((<URL:http://www.xmlsoft.org/>)).
|
151
|
+
|
152
|
+
=end
|
data/lib/bio-phyloxml.rb
ADDED
@@ -0,0 +1,27 @@
|
|
1
|
+
# Please require your code below, respecting the naming conventions in the
|
2
|
+
# bioruby directory tree.
|
3
|
+
#
|
4
|
+
# For example, say you have a plugin named bio-plugin, the only uncommented
|
5
|
+
# line in this file would be
|
6
|
+
#
|
7
|
+
# require 'bio/bio-plugin/plugin'
|
8
|
+
#
|
9
|
+
# In this file only require other files. Avoid other source code.
|
10
|
+
|
11
|
+
require 'bio-phyloxml/compat/cleanup.rb'
|
12
|
+
require 'bio-phyloxml/version.rb'
|
13
|
+
require 'bio-phyloxml/phyloxml_elements.rb'
|
14
|
+
require 'bio-phyloxml/phyloxml_parser.rb'
|
15
|
+
require 'bio-phyloxml/phyloxml_writer.rb'
|
16
|
+
|
17
|
+
if require 'bio-phyloxml/compat/stub_phyloxml_elements.rb'
|
18
|
+
require_relative 'bio/db/phyloxml/phyloxml_elements.rb'
|
19
|
+
end
|
20
|
+
|
21
|
+
if require 'bio-phyloxml/compat/stub_phyloxml_parser.rb'
|
22
|
+
require_relative 'bio/db/phyloxml/phyloxml_parser.rb'
|
23
|
+
end
|
24
|
+
|
25
|
+
if require 'bio-phyloxml/compat/stub_phyloxml_writer.rb'
|
26
|
+
require_relative 'bio/db/phyloxml/phyloxml_writer.rb'
|
27
|
+
end
|