bio-nexml 1.0.0 → 1.1.0

Sign up to get free protection for your applications and to get access to all the features.
@@ -0,0 +1,3 @@
1
+ bio-nexml\.kpf
2
+ pkg
3
+ Gemfile.lock
@@ -0,0 +1,10 @@
1
+ language: ruby
2
+ rvm:
3
+ - 1.8.7
4
+ - 1.9.2
5
+
6
+ notifications:
7
+ email:
8
+ recipients:
9
+ - anurag08priyam@gmail.com
10
+ - rutgeraldo@gmail.com
data/Gemfile ADDED
@@ -0,0 +1,11 @@
1
+ source "http://rubygems.org"
2
+
3
+ gemspec
4
+ gem "bio", ">= 1.4.2"
5
+
6
+ group :development do
7
+ gem "rake"
8
+ gem "rspec", "~> 2.3.0"
9
+ gem "bundler"
10
+ end
11
+
@@ -0,0 +1,459 @@
1
+ [![Build status](https://secure.travis-ci.org/nexml/bio-nexml.png)](http://travis-ci.org/#!/nexml/bio-nexml)
2
+
3
+ bio-nexml is listed at http://biogems.info/
4
+
5
+ # bio-nexml
6
+
7
+ NeXML is a file format for phylogenetic data. It is inspired by the modular
8
+ architecture of the commonly-used NEXUS file format (hence the name) in that
9
+ a NeXML instance document can contain:
10
+ * sets of Operational Taxonomic Units (OTUs), i.e. the tips in phylogenetic
11
+ trees, and that which comparative observations are made on. Often these are
12
+ species ("taxa").
13
+ * sets of phylogenetic trees (or reticulate trees, i.e. networks)
14
+ * sets of comparative data, i.e. molecular sequences, morphological categorical
15
+ data, continuous data, and other types.
16
+
17
+ The elements in a NeXML document can be annotated using RDFa
18
+ (http://en.wikipedia.org/wiki/RDFa), which means that every object that can
19
+ be parsed out of a NeXML document must be an object that, in turn, can be
20
+ annotated with predicates (and their namespaces) and other objects (with,
21
+ perhaps, their own namespaces). The advantage over previous file formats is
22
+ that we can retain all metadata for all objects within one file, regardless
23
+ where the metadata come from.
24
+
25
+ NeXML can be transformed to RDF using an XSL stylesheet. As such, NeXML forms
26
+ an intermediate format between traditional flat file formats (with predictable
27
+ structure but no semantics) and RDF (with loose structure, but lots of
28
+ semantics) that is both easy to work with, yet ready for the Semantic Web.
29
+
30
+ To learn more, visit http://www.nexml.org
31
+
32
+ ## Parsing
33
+ Currently all the parsing is done at the start( i.e. no streaming ). This is likely to change later. Parse an NeXML file:
34
+
35
+ ```ruby
36
+ doc = Bio::NeXML::Parser.new( "trees.xml" )
37
+ nexml = doc.parse
38
+ nexml.class #Bio::NeXML::Nexml
39
+ ```
40
+
41
+ ## Serializing
42
+ `Bio::NeXML::Writer` class provides a wrapper over libxml-ruby to create any NeXML document. This class defines a set of `serialize_*` instance methods which can be called on the appropriate object to get its NeXML representation. The method returns a `XML::Node` object. To get the raw NeXML representation `to_s` method should be called on the return value.
43
+
44
+ NeXML defines three top level containers: `otus`, `trees`, `characters` which bear parent-child relation with other NeXML elements. In effect, a valid NeXML document has only three type of immediate children. Naturally, a typical working paradigm would be to create `Bio::NeXML::Otus`, `Bio::NeXML::Trees`, and `Bio::NeXML::Characters` objects and write them to the NeXML file.
45
+
46
+ ```ruby
47
+ # Parse a test file. This will give us Bio::NeXML::Otus,
48
+ # Bio::NeXML::Trees, and Bio::NeXML::Characters object.
49
+ doc1 = Bio::NeXML::Parser.new 'test.xml'
50
+ nexml = doc1.parse
51
+ doc1.close
52
+
53
+ # Create a Writer object,
54
+ writer = Bio::NeXML::Writer.new
55
+
56
+ # add otus, trees and characters to it,
57
+ writer << nexml.otus
58
+ writer << nexml.trees
59
+ writer << nexml.characters
60
+
61
+ # save it.
62
+ writer.save 'sample.xml'
63
+ ```
64
+
65
+ `Bio::NeXML::Writer` internally calls some `serialize_*` method at the lowest level. If need be, these `serialize_*` methods can be called to obtain raw NeXML representation of any NeXML element.
66
+
67
+ ``` ruby
68
+ # Create an otus object with a child otu element
69
+ taxa1 = Bio::NeXML::Otus.new 'taxa1', 'A taxa block'
70
+ o1 = Bio::NeXML::Otu.new 'o1', 'A taxon'
71
+ taxa1 << o1
72
+
73
+ # Obtain the raw NeXML representation of the otus object created
74
+ writer = Bio::NeXML::Writer.new
75
+ writer.serialize_otus( taxa1 ).to_s
76
+ # => "<otus label=\"A taxa block\" id=\"taxa1\">\n <otu label=\"A taxon\" id=\"o1\"/>\n</otus>"
77
+ ```
78
+
79
+ Unit tests for serializer are filled with such use cases.
80
+
81
+ ## Nexml
82
+
83
+ ``` ruby
84
+ #get a hash of otus objects indexed with 'id'
85
+ nexml.otus_set
86
+
87
+ #get an array of otus objects
88
+ nexml.otus
89
+
90
+ #get an otus by id
91
+ taxa1 = nexml.get_otus_by_id "taxa1"
92
+
93
+ #iterate over each otus object
94
+ nexml.each_otus do |taxa|
95
+ puts taxa.id
96
+ puts taxa.label
97
+ end
98
+
99
+ #characters
100
+ nexml.trees_set #return a hash of trees object indexed with 'id'
101
+ nexml.trees #return an array of trees objects.
102
+
103
+ #iterate over each trees object
104
+ nexml.each_trees do |trees|
105
+ puts trees.id
106
+ puts trees.label
107
+ end
108
+
109
+ #find a trees by id
110
+ trees1 = nexml.get_trees_by_id 'trees1'
111
+
112
+ # characters
113
+ nexml.characters_set #return a hash of characters object indexed with 'id'
114
+ nexml.characters #return an array of characters object
115
+
116
+ #iterate over each characters object
117
+ nexml.each_characters do |ch|
118
+ puts ch.id
119
+ puts ch.label
120
+ end
121
+
122
+ #find a characters object by id
123
+ characters = nexml.get_characters_by_id 'chars1'
124
+ ```
125
+
126
+ ## Otus
127
+
128
+ Taxa blocks and taxons are stored internally as a Ruby hash for faster 'id' based lookup.
129
+ Consider [https://www.nescent.org/wg_phyloinformatics/NeXML_Elements#Example this] NeXML
130
+ snippet
131
+
132
+ ``` ruby
133
+ #get the id of otus
134
+ taxa1.id # "taxa1"
135
+
136
+ #get the label of otus
137
+ taxa1.label # "Primary taxa block"
138
+
139
+ #get a hash of child otu objects indexed with id
140
+ taxa1.otu_set
141
+
142
+ #get an array of child otu objects
143
+ taxa1.otus
144
+
145
+ #get an otu object by id
146
+ #get_otu_by_id is an alias of []
147
+ t1 = taxa1[ 't1' ]
148
+
149
+ #add an otu object to otus
150
+ t1.add_otu( otu_object )
151
+ #to add more than one otu object at a time use << or otus= method
152
+ t1 << [otu_object1, otu_object2]
153
+ t1.otus = otu_object1, otu_object2
154
+
155
+ #or iterate over each otu object
156
+ #each_otu is an alias for each
157
+ taxa1.each do |taxon|
158
+ puts taxon.id
159
+ puts taxon.label
160
+ end
161
+
162
+ #check if an otu with given id belongs to an otus or not
163
+ #include? and has? are alias for has_otu?
164
+ taxa1.has_otu? 't2' # => true
165
+ taxa1.has? 't8' # => false
166
+
167
+ #an otus object in enumerable
168
+ taxa1.map &:id # => array of otu ids
169
+ taxa1.select {|t| t.class == "Lemurs" } #maybe in future
170
+ ```
171
+
172
+ ### Otu
173
+
174
+ ``` ruby
175
+ #get an otu's id
176
+ t1.id # => "t1"
177
+
178
+ #get an otu's label
179
+ t1.label # => "Homo sapiens"
180
+ ```
181
+
182
+ ## Trees
183
+ Trees and tree and network are stored internally as a Ruby hash for faster 'id' based lookup.
184
+
185
+ ``` ruby
186
+ trees1.class #Bio::NeXML::Trees
187
+
188
+ #get the taxa block to which the trees is linked to
189
+ trees1.otus #returns an otus object
190
+ ```
191
+
192
+ ### Tree
193
+
194
+ ``` ruby
195
+ trees1.tree_set #return a hash or tree objects indexed with 'id'
196
+ tress1.trees #return an arrayof trees object
197
+
198
+ #iterate over each tree object
199
+ trees1.each_tree do |t|
200
+ puts t.id
201
+ puts t.label
202
+ end
203
+
204
+ #get a tree object with its 'tree1'
205
+ tree1 = trees1[ 'tree1' ]
206
+ #or, with a conventional method call
207
+ tree1 = trees1.get_tree_by_id 'tree1'
208
+ #or, from a nexml object
209
+ tree1 = nexml.get_tree_by_id 'tree1'
210
+
211
+ tree1.class #Bio::NeXML::IntTree or Bio::NeXML::FloatTree
212
+
213
+ #check if a tree belongs to a trees or not
214
+ #pass it a tree id
215
+ tree1.has_tree? 'tree1' #return true or false
216
+
217
+ #get the number of treess
218
+ trees1.number_of_trees
219
+ ```
220
+
221
+ ### Network
222
+
223
+ ``` ruby
224
+ trees1.network_set #return a hash or network objects indexed with 'id'
225
+ tress1.networks #return an arrayof network objects
226
+
227
+ #iterate over each network object
228
+ trees1.each_network do |n|
229
+ puts n.id
230
+ puts n.label
231
+ end
232
+
233
+ #get a network object with its id
234
+ network1 = trees1[ 'network1' ]
235
+ #or, with a conventional method call
236
+ network1 = trees1.get_network_by_id 'network1'
237
+ #or, from a nexml object
238
+ network1 = nexml.get_tree_by_id 'network1'
239
+
240
+ network1.class #Bio::NeXML::IntTree or Bio::NeXML::FloatTree
241
+
242
+ #check if a network belongs to a trees or not
243
+ #pass it a network id
244
+ trees1.has_network? 'network1' #return true or false
245
+
246
+ #get the number of networks
247
+ trees1.number_of_networks
248
+ ```
249
+
250
+ ### Tree and Network
251
+
252
+ ``` ruby
253
+ #iterate over both trees and networks
254
+ trees1.each do |g|
255
+ puts g.class
256
+ end
257
+
258
+ #find if a tree or a network belongs to a trees or not
259
+ #include? is an alias for has?
260
+ trees1.has? 'tree1' #return true or false
261
+
262
+ #total number of trees and networks
263
+ trees1.number_of_graphs
264
+ ```
265
+
266
+ All the available methods from [http://bioruby.org/rdoc/classes/Bio/Tree.html#M001688 Bio::Tree]
267
+ class can be called on a tree object.
268
+
269
+ ``` ruby
270
+ node1 = tree.get_node_by_name "n3" #note name is same as id
271
+ tree1.parents node1
272
+ ```
273
+
274
+ A trees object is an enumerable:
275
+
276
+ ``` ruby
277
+ trees1.map &:id
278
+ ```
279
+
280
+ ## Characters
281
+
282
+ ``` ruby
283
+ puts characters.class
284
+
285
+ #get the taxa block to which the characters is linked to
286
+ characters.otus #returns an otus object
287
+
288
+ #get the child format element
289
+ format = characters.format
290
+
291
+ puts format.class
292
+
293
+ #get the child matrix element
294
+ matrix = characters.matrix
295
+
296
+ puts matrix.class
297
+ ```
298
+
299
+ ### Format
300
+
301
+ ``` ruby
302
+ format.states_set #return a hash of states objects indexed with 'id'
303
+ format.states #return an array of states object
304
+
305
+ #iterate over each states object
306
+ format.each_states do |states|
307
+ puts states.id
308
+ puts states.label
309
+ end
310
+
311
+ #get a states object by id
312
+ states = format.get_states_by_id 'states1'
313
+
314
+ #check if the states object with 'id' belongs to format or not
315
+ format.has_states? 'states1'
316
+
317
+ format.char_set #return a hash of char objects indexed with 'id'
318
+ format.chars #return an array of char objects
319
+
320
+ #iterate over each char object
321
+ format.each_char do |char|
322
+ puts char.id
323
+ puts char.label
324
+ end
325
+
326
+ #get a char object by id
327
+ char = format.get_char_by_id 'char1'
328
+
329
+ #check if the char object with 'id' belongs to format or not
330
+ format.has_char? 'char1'
331
+
332
+ #get a states or a char object by id
333
+ state = format[ 'states1' ]
334
+ char = format[ 'char1' ]
335
+
336
+ #check if a states or a char object with 'id' belongs to format or not
337
+ format.has? 'states1'
338
+ format.has? 'char1'
339
+
340
+ #all objects, including char and states can be iterated over with each
341
+ format.each do |obj|
342
+ puts obj.class
343
+ end
344
+
345
+ #format is enumerable
346
+ format.map &:id
347
+ ```
348
+
349
+ #### States
350
+
351
+ ``` ruby
352
+ states.state_set #return a hash of state objects indexed with 'id'
353
+ states.states #return an array of state objects
354
+
355
+ #iterate over each state object
356
+ states.each_state do |state|
357
+ puts state.id
358
+ end
359
+ #or, use its alias each
360
+
361
+ #get a state object by id
362
+ state = states.get_state_by_id 'state1'
363
+ #or, use hash notation
364
+ state = states[ 'state1' ]
365
+
366
+ #check if a state belongs to states or not
367
+ states.has_state? 'state1'
368
+ #or, use its alias has? and include?
369
+ ```
370
+
371
+ ##### State
372
+
373
+ ``` ruby
374
+ #get the symbol associated with the state
375
+ state.symbol
376
+
377
+ #find if the state is ambiguous
378
+ state.ambiguous?
379
+
380
+ #find the kind of ambiguity
381
+ state.ambiguity
382
+
383
+ #find if it is an uncertain state set
384
+ state.uncertain?
385
+
386
+ #find if it is a polymorphic state set
387
+ state.polymorphic?
388
+
389
+ #get the members of a state set as an array
390
+ state.members
391
+
392
+ #or iterate over each member
393
+ state.each do |member|
394
+ puts member.class #same as self
395
+ puts member.id
396
+ end
397
+
398
+ #a state is Enumerable over its members
399
+ state.select{ |member| member.id == "rna5" }
400
+ ```
401
+
402
+ #### Char
403
+
404
+ ``` ruby
405
+ #get the id
406
+ char.id
407
+
408
+ #get the label
409
+ char.label
410
+
411
+ #get the states object the char is linked to
412
+ char.states
413
+
414
+ #get the codon position for DnaChar and RnaChar objects
415
+ char.codon
416
+ ```
417
+
418
+ ### Matrix
419
+
420
+ ...
421
+
422
+ ## Contributing to bio-nexml
423
+
424
+ * Check out the latest master to make sure the feature hasn't been implemented
425
+ or the bug hasn't been fixed yet
426
+ * Check out the issue tracker to make sure someone already hasn't requested it
427
+ and/or contributed it
428
+ * Fork the project
429
+ * Start a feature/bugfix branch
430
+ * Commit and push until you are happy with your contribution
431
+ * Make sure to add tests for it. This is important so I don't break it in a
432
+ future version unintentionally.
433
+ * Please try not to mess with the Rakefile, version, or history. If you want to
434
+ have your own version, or is otherwise necessary, that is fine, but please
435
+ isolate to its own commit so I can cherry-pick around it.
436
+
437
+ ## Acknowledgements
438
+
439
+ The research leading to these results has received funding from the [European
440
+ Community's] Seventh Framework Programme ([FP7/2007-2013] under grant agreement
441
+ n� [237046].
442
+
443
+ ## Citing bio-nexml
444
+
445
+ If you use this software, please cite:
446
+
447
+ > [NeXML: rich, extensible, and verifiable representation of comparative data and metadata][1]
448
+
449
+ and
450
+
451
+ > [Biogem: an effective tool based approach for scaling up open source software development in bioinformatics][2]
452
+
453
+ ## Copyright
454
+
455
+ Copyright (c) 2011 Rutger Vos and Anurag Priyam. See LICENSE.txt for further
456
+ details.
457
+
458
+ [1]: http://sysbio.oxfordjournals.org/content/early/2012/02/12/sysbio.sys025.short
459
+ [2]: http://dx.doi.org/10.1093/bioinformatics/bts080