bio-nexml 1.0.0 → 1.1.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -0,0 +1,3 @@
1
+ bio-nexml\.kpf
2
+ pkg
3
+ Gemfile.lock
@@ -0,0 +1,10 @@
1
+ language: ruby
2
+ rvm:
3
+ - 1.8.7
4
+ - 1.9.2
5
+
6
+ notifications:
7
+ email:
8
+ recipients:
9
+ - anurag08priyam@gmail.com
10
+ - rutgeraldo@gmail.com
data/Gemfile ADDED
@@ -0,0 +1,11 @@
1
+ source "http://rubygems.org"
2
+
3
+ gemspec
4
+ gem "bio", ">= 1.4.2"
5
+
6
+ group :development do
7
+ gem "rake"
8
+ gem "rspec", "~> 2.3.0"
9
+ gem "bundler"
10
+ end
11
+
@@ -0,0 +1,459 @@
1
+ [![Build status](https://secure.travis-ci.org/nexml/bio-nexml.png)](http://travis-ci.org/#!/nexml/bio-nexml)
2
+
3
+ bio-nexml is listed at http://biogems.info/
4
+
5
+ # bio-nexml
6
+
7
+ NeXML is a file format for phylogenetic data. It is inspired by the modular
8
+ architecture of the commonly-used NEXUS file format (hence the name) in that
9
+ a NeXML instance document can contain:
10
+ * sets of Operational Taxonomic Units (OTUs), i.e. the tips in phylogenetic
11
+ trees, and that which comparative observations are made on. Often these are
12
+ species ("taxa").
13
+ * sets of phylogenetic trees (or reticulate trees, i.e. networks)
14
+ * sets of comparative data, i.e. molecular sequences, morphological categorical
15
+ data, continuous data, and other types.
16
+
17
+ The elements in a NeXML document can be annotated using RDFa
18
+ (http://en.wikipedia.org/wiki/RDFa), which means that every object that can
19
+ be parsed out of a NeXML document must be an object that, in turn, can be
20
+ annotated with predicates (and their namespaces) and other objects (with,
21
+ perhaps, their own namespaces). The advantage over previous file formats is
22
+ that we can retain all metadata for all objects within one file, regardless
23
+ where the metadata come from.
24
+
25
+ NeXML can be transformed to RDF using an XSL stylesheet. As such, NeXML forms
26
+ an intermediate format between traditional flat file formats (with predictable
27
+ structure but no semantics) and RDF (with loose structure, but lots of
28
+ semantics) that is both easy to work with, yet ready for the Semantic Web.
29
+
30
+ To learn more, visit http://www.nexml.org
31
+
32
+ ## Parsing
33
+ Currently all the parsing is done at the start( i.e. no streaming ). This is likely to change later. Parse an NeXML file:
34
+
35
+ ```ruby
36
+ doc = Bio::NeXML::Parser.new( "trees.xml" )
37
+ nexml = doc.parse
38
+ nexml.class #Bio::NeXML::Nexml
39
+ ```
40
+
41
+ ## Serializing
42
+ `Bio::NeXML::Writer` class provides a wrapper over libxml-ruby to create any NeXML document. This class defines a set of `serialize_*` instance methods which can be called on the appropriate object to get its NeXML representation. The method returns a `XML::Node` object. To get the raw NeXML representation `to_s` method should be called on the return value.
43
+
44
+ NeXML defines three top level containers: `otus`, `trees`, `characters` which bear parent-child relation with other NeXML elements. In effect, a valid NeXML document has only three type of immediate children. Naturally, a typical working paradigm would be to create `Bio::NeXML::Otus`, `Bio::NeXML::Trees`, and `Bio::NeXML::Characters` objects and write them to the NeXML file.
45
+
46
+ ```ruby
47
+ # Parse a test file. This will give us Bio::NeXML::Otus,
48
+ # Bio::NeXML::Trees, and Bio::NeXML::Characters object.
49
+ doc1 = Bio::NeXML::Parser.new 'test.xml'
50
+ nexml = doc1.parse
51
+ doc1.close
52
+
53
+ # Create a Writer object,
54
+ writer = Bio::NeXML::Writer.new
55
+
56
+ # add otus, trees and characters to it,
57
+ writer << nexml.otus
58
+ writer << nexml.trees
59
+ writer << nexml.characters
60
+
61
+ # save it.
62
+ writer.save 'sample.xml'
63
+ ```
64
+
65
+ `Bio::NeXML::Writer` internally calls some `serialize_*` method at the lowest level. If need be, these `serialize_*` methods can be called to obtain raw NeXML representation of any NeXML element.
66
+
67
+ ``` ruby
68
+ # Create an otus object with a child otu element
69
+ taxa1 = Bio::NeXML::Otus.new 'taxa1', 'A taxa block'
70
+ o1 = Bio::NeXML::Otu.new 'o1', 'A taxon'
71
+ taxa1 << o1
72
+
73
+ # Obtain the raw NeXML representation of the otus object created
74
+ writer = Bio::NeXML::Writer.new
75
+ writer.serialize_otus( taxa1 ).to_s
76
+ # => "<otus label=\"A taxa block\" id=\"taxa1\">\n <otu label=\"A taxon\" id=\"o1\"/>\n</otus>"
77
+ ```
78
+
79
+ Unit tests for serializer are filled with such use cases.
80
+
81
+ ## Nexml
82
+
83
+ ``` ruby
84
+ #get a hash of otus objects indexed with 'id'
85
+ nexml.otus_set
86
+
87
+ #get an array of otus objects
88
+ nexml.otus
89
+
90
+ #get an otus by id
91
+ taxa1 = nexml.get_otus_by_id "taxa1"
92
+
93
+ #iterate over each otus object
94
+ nexml.each_otus do |taxa|
95
+ puts taxa.id
96
+ puts taxa.label
97
+ end
98
+
99
+ #characters
100
+ nexml.trees_set #return a hash of trees object indexed with 'id'
101
+ nexml.trees #return an array of trees objects.
102
+
103
+ #iterate over each trees object
104
+ nexml.each_trees do |trees|
105
+ puts trees.id
106
+ puts trees.label
107
+ end
108
+
109
+ #find a trees by id
110
+ trees1 = nexml.get_trees_by_id 'trees1'
111
+
112
+ # characters
113
+ nexml.characters_set #return a hash of characters object indexed with 'id'
114
+ nexml.characters #return an array of characters object
115
+
116
+ #iterate over each characters object
117
+ nexml.each_characters do |ch|
118
+ puts ch.id
119
+ puts ch.label
120
+ end
121
+
122
+ #find a characters object by id
123
+ characters = nexml.get_characters_by_id 'chars1'
124
+ ```
125
+
126
+ ## Otus
127
+
128
+ Taxa blocks and taxons are stored internally as a Ruby hash for faster 'id' based lookup.
129
+ Consider [https://www.nescent.org/wg_phyloinformatics/NeXML_Elements#Example this] NeXML
130
+ snippet
131
+
132
+ ``` ruby
133
+ #get the id of otus
134
+ taxa1.id # "taxa1"
135
+
136
+ #get the label of otus
137
+ taxa1.label # "Primary taxa block"
138
+
139
+ #get a hash of child otu objects indexed with id
140
+ taxa1.otu_set
141
+
142
+ #get an array of child otu objects
143
+ taxa1.otus
144
+
145
+ #get an otu object by id
146
+ #get_otu_by_id is an alias of []
147
+ t1 = taxa1[ 't1' ]
148
+
149
+ #add an otu object to otus
150
+ t1.add_otu( otu_object )
151
+ #to add more than one otu object at a time use << or otus= method
152
+ t1 << [otu_object1, otu_object2]
153
+ t1.otus = otu_object1, otu_object2
154
+
155
+ #or iterate over each otu object
156
+ #each_otu is an alias for each
157
+ taxa1.each do |taxon|
158
+ puts taxon.id
159
+ puts taxon.label
160
+ end
161
+
162
+ #check if an otu with given id belongs to an otus or not
163
+ #include? and has? are alias for has_otu?
164
+ taxa1.has_otu? 't2' # => true
165
+ taxa1.has? 't8' # => false
166
+
167
+ #an otus object in enumerable
168
+ taxa1.map &:id # => array of otu ids
169
+ taxa1.select {|t| t.class == "Lemurs" } #maybe in future
170
+ ```
171
+
172
+ ### Otu
173
+
174
+ ``` ruby
175
+ #get an otu's id
176
+ t1.id # => "t1"
177
+
178
+ #get an otu's label
179
+ t1.label # => "Homo sapiens"
180
+ ```
181
+
182
+ ## Trees
183
+ Trees and tree and network are stored internally as a Ruby hash for faster 'id' based lookup.
184
+
185
+ ``` ruby
186
+ trees1.class #Bio::NeXML::Trees
187
+
188
+ #get the taxa block to which the trees is linked to
189
+ trees1.otus #returns an otus object
190
+ ```
191
+
192
+ ### Tree
193
+
194
+ ``` ruby
195
+ trees1.tree_set #return a hash or tree objects indexed with 'id'
196
+ tress1.trees #return an arrayof trees object
197
+
198
+ #iterate over each tree object
199
+ trees1.each_tree do |t|
200
+ puts t.id
201
+ puts t.label
202
+ end
203
+
204
+ #get a tree object with its 'tree1'
205
+ tree1 = trees1[ 'tree1' ]
206
+ #or, with a conventional method call
207
+ tree1 = trees1.get_tree_by_id 'tree1'
208
+ #or, from a nexml object
209
+ tree1 = nexml.get_tree_by_id 'tree1'
210
+
211
+ tree1.class #Bio::NeXML::IntTree or Bio::NeXML::FloatTree
212
+
213
+ #check if a tree belongs to a trees or not
214
+ #pass it a tree id
215
+ tree1.has_tree? 'tree1' #return true or false
216
+
217
+ #get the number of treess
218
+ trees1.number_of_trees
219
+ ```
220
+
221
+ ### Network
222
+
223
+ ``` ruby
224
+ trees1.network_set #return a hash or network objects indexed with 'id'
225
+ tress1.networks #return an arrayof network objects
226
+
227
+ #iterate over each network object
228
+ trees1.each_network do |n|
229
+ puts n.id
230
+ puts n.label
231
+ end
232
+
233
+ #get a network object with its id
234
+ network1 = trees1[ 'network1' ]
235
+ #or, with a conventional method call
236
+ network1 = trees1.get_network_by_id 'network1'
237
+ #or, from a nexml object
238
+ network1 = nexml.get_tree_by_id 'network1'
239
+
240
+ network1.class #Bio::NeXML::IntTree or Bio::NeXML::FloatTree
241
+
242
+ #check if a network belongs to a trees or not
243
+ #pass it a network id
244
+ trees1.has_network? 'network1' #return true or false
245
+
246
+ #get the number of networks
247
+ trees1.number_of_networks
248
+ ```
249
+
250
+ ### Tree and Network
251
+
252
+ ``` ruby
253
+ #iterate over both trees and networks
254
+ trees1.each do |g|
255
+ puts g.class
256
+ end
257
+
258
+ #find if a tree or a network belongs to a trees or not
259
+ #include? is an alias for has?
260
+ trees1.has? 'tree1' #return true or false
261
+
262
+ #total number of trees and networks
263
+ trees1.number_of_graphs
264
+ ```
265
+
266
+ All the available methods from [http://bioruby.org/rdoc/classes/Bio/Tree.html#M001688 Bio::Tree]
267
+ class can be called on a tree object.
268
+
269
+ ``` ruby
270
+ node1 = tree.get_node_by_name "n3" #note name is same as id
271
+ tree1.parents node1
272
+ ```
273
+
274
+ A trees object is an enumerable:
275
+
276
+ ``` ruby
277
+ trees1.map &:id
278
+ ```
279
+
280
+ ## Characters
281
+
282
+ ``` ruby
283
+ puts characters.class
284
+
285
+ #get the taxa block to which the characters is linked to
286
+ characters.otus #returns an otus object
287
+
288
+ #get the child format element
289
+ format = characters.format
290
+
291
+ puts format.class
292
+
293
+ #get the child matrix element
294
+ matrix = characters.matrix
295
+
296
+ puts matrix.class
297
+ ```
298
+
299
+ ### Format
300
+
301
+ ``` ruby
302
+ format.states_set #return a hash of states objects indexed with 'id'
303
+ format.states #return an array of states object
304
+
305
+ #iterate over each states object
306
+ format.each_states do |states|
307
+ puts states.id
308
+ puts states.label
309
+ end
310
+
311
+ #get a states object by id
312
+ states = format.get_states_by_id 'states1'
313
+
314
+ #check if the states object with 'id' belongs to format or not
315
+ format.has_states? 'states1'
316
+
317
+ format.char_set #return a hash of char objects indexed with 'id'
318
+ format.chars #return an array of char objects
319
+
320
+ #iterate over each char object
321
+ format.each_char do |char|
322
+ puts char.id
323
+ puts char.label
324
+ end
325
+
326
+ #get a char object by id
327
+ char = format.get_char_by_id 'char1'
328
+
329
+ #check if the char object with 'id' belongs to format or not
330
+ format.has_char? 'char1'
331
+
332
+ #get a states or a char object by id
333
+ state = format[ 'states1' ]
334
+ char = format[ 'char1' ]
335
+
336
+ #check if a states or a char object with 'id' belongs to format or not
337
+ format.has? 'states1'
338
+ format.has? 'char1'
339
+
340
+ #all objects, including char and states can be iterated over with each
341
+ format.each do |obj|
342
+ puts obj.class
343
+ end
344
+
345
+ #format is enumerable
346
+ format.map &:id
347
+ ```
348
+
349
+ #### States
350
+
351
+ ``` ruby
352
+ states.state_set #return a hash of state objects indexed with 'id'
353
+ states.states #return an array of state objects
354
+
355
+ #iterate over each state object
356
+ states.each_state do |state|
357
+ puts state.id
358
+ end
359
+ #or, use its alias each
360
+
361
+ #get a state object by id
362
+ state = states.get_state_by_id 'state1'
363
+ #or, use hash notation
364
+ state = states[ 'state1' ]
365
+
366
+ #check if a state belongs to states or not
367
+ states.has_state? 'state1'
368
+ #or, use its alias has? and include?
369
+ ```
370
+
371
+ ##### State
372
+
373
+ ``` ruby
374
+ #get the symbol associated with the state
375
+ state.symbol
376
+
377
+ #find if the state is ambiguous
378
+ state.ambiguous?
379
+
380
+ #find the kind of ambiguity
381
+ state.ambiguity
382
+
383
+ #find if it is an uncertain state set
384
+ state.uncertain?
385
+
386
+ #find if it is a polymorphic state set
387
+ state.polymorphic?
388
+
389
+ #get the members of a state set as an array
390
+ state.members
391
+
392
+ #or iterate over each member
393
+ state.each do |member|
394
+ puts member.class #same as self
395
+ puts member.id
396
+ end
397
+
398
+ #a state is Enumerable over its members
399
+ state.select{ |member| member.id == "rna5" }
400
+ ```
401
+
402
+ #### Char
403
+
404
+ ``` ruby
405
+ #get the id
406
+ char.id
407
+
408
+ #get the label
409
+ char.label
410
+
411
+ #get the states object the char is linked to
412
+ char.states
413
+
414
+ #get the codon position for DnaChar and RnaChar objects
415
+ char.codon
416
+ ```
417
+
418
+ ### Matrix
419
+
420
+ ...
421
+
422
+ ## Contributing to bio-nexml
423
+
424
+ * Check out the latest master to make sure the feature hasn't been implemented
425
+ or the bug hasn't been fixed yet
426
+ * Check out the issue tracker to make sure someone already hasn't requested it
427
+ and/or contributed it
428
+ * Fork the project
429
+ * Start a feature/bugfix branch
430
+ * Commit and push until you are happy with your contribution
431
+ * Make sure to add tests for it. This is important so I don't break it in a
432
+ future version unintentionally.
433
+ * Please try not to mess with the Rakefile, version, or history. If you want to
434
+ have your own version, or is otherwise necessary, that is fine, but please
435
+ isolate to its own commit so I can cherry-pick around it.
436
+
437
+ ## Acknowledgements
438
+
439
+ The research leading to these results has received funding from the [European
440
+ Community's] Seventh Framework Programme ([FP7/2007-2013] under grant agreement
441
+ n� [237046].
442
+
443
+ ## Citing bio-nexml
444
+
445
+ If you use this software, please cite:
446
+
447
+ > [NeXML: rich, extensible, and verifiable representation of comparative data and metadata][1]
448
+
449
+ and
450
+
451
+ > [Biogem: an effective tool based approach for scaling up open source software development in bioinformatics][2]
452
+
453
+ ## Copyright
454
+
455
+ Copyright (c) 2011 Rutger Vos and Anurag Priyam. See LICENSE.txt for further
456
+ details.
457
+
458
+ [1]: http://sysbio.oxfordjournals.org/content/early/2012/02/12/sysbio.sys025.short
459
+ [2]: http://dx.doi.org/10.1093/bioinformatics/bts080