biointerchange 0.2.2 → 1.0.0
Sign up to get free protection for your applications and to get access to all the features.
- data/Gemfile +1 -0
- data/README.md +269 -19
- data/VERSION +1 -1
- data/examples/bininda_emonds_mammals.new +1 -0
- data/examples/rdfization.rb +17 -0
- data/examples/tree1.new +1 -0
- data/examples/tree2.new +1 -0
- data/examples/vocabulary.rb +26 -5
- data/generators/javaify.rb +12 -18
- data/generators/make_supplement_releases.rb +2 -0
- data/generators/pythonify.rb +21 -8
- data/generators/rdfxml.rb +15 -1
- data/lib/biointerchange/cdao.rb +2014 -0
- data/lib/biointerchange/core.rb +70 -77
- data/lib/biointerchange/genomics/gff3_rdf_ntriples.rb +16 -0
- data/lib/biointerchange/genomics/gff3_reader.rb +18 -4
- data/lib/biointerchange/genomics/gvf_reader.rb +14 -0
- data/lib/biointerchange/phylogenetics/cdao_rdf_ntriples.rb +108 -0
- data/lib/biointerchange/phylogenetics/newick_reader.rb +81 -0
- data/lib/biointerchange/phylogenetics/tree_set.rb +50 -0
- data/lib/biointerchange/registry.rb +50 -8
- data/lib/biointerchange/so.rb +150 -0
- data/lib/biointerchange/textmining/pdfx_xml_reader.rb +21 -2
- data/lib/biointerchange/textmining/pubannos_json_reader.rb +24 -1
- data/lib/biointerchange/textmining/text_mining_rdf_ntriples.rb +9 -0
- data/lib/biointerchange/textmining/text_mining_reader.rb +5 -5
- data/spec/phylogenetics_spec.rb +79 -0
- data/supplemental/java/biointerchange/pom.xml +1 -1
- data/supplemental/java/biointerchange/src/main/java/org/biointerchange/vocabulary/CDAO.java +2602 -0
- data/supplemental/java/biointerchange/src/main/java/org/biointerchange/vocabulary/FALDO.java +30 -28
- data/supplemental/java/biointerchange/src/main/java/org/biointerchange/vocabulary/GFF3O.java +136 -104
- data/supplemental/java/biointerchange/src/main/java/org/biointerchange/vocabulary/GVF1O.java +367 -278
- data/supplemental/java/biointerchange/src/main/java/org/biointerchange/vocabulary/SIO.java +4388 -3127
- data/supplemental/java/biointerchange/src/main/java/org/biointerchange/vocabulary/SO.java +5970 -4351
- data/supplemental/java/biointerchange/src/main/java/org/biointerchange/vocabulary/SOFA.java +733 -544
- data/supplemental/java/biointerchange/src/test/java/org/biointerchange/AppTest.java +3 -1
- data/supplemental/python/biointerchange/cdao.py +2021 -0
- data/supplemental/python/biointerchange/faldo.py +37 -38
- data/supplemental/python/biointerchange/gff3o.py +156 -157
- data/supplemental/python/biointerchange/goxref.py +172 -172
- data/supplemental/python/biointerchange/gvf1o.py +428 -429
- data/supplemental/python/biointerchange/sio.py +3133 -3134
- data/supplemental/python/biointerchange/so.py +6626 -6527
- data/supplemental/python/biointerchange/sofa.py +790 -791
- data/supplemental/python/example.py +23 -5
- data/supplemental/python/setup.py +2 -2
- data/web/about.html +1 -0
- data/web/api.html +223 -15
- data/web/biointerchange.js +27 -6
- data/web/cli.html +8 -3
- data/web/index.html +6 -2
- data/web/ontologies.html +3 -0
- data/web/service/rdfizer.fcgi +7 -15
- data/web/webservices.html +6 -2
- metadata +30 -3
data/Gemfile
CHANGED
data/README.md
CHANGED
@@ -9,6 +9,7 @@ Supported input file formats (see examples directory):
|
|
9
9
|
|
10
10
|
* [GFF3](http://www.sequenceontology.org/resources/gff3.html)
|
11
11
|
* [GVF](http://www.sequenceontology.org/resources/gvf.html)
|
12
|
+
* [Newick](http://evolution.genetics.washington.edu/phylip/newicktree.html)
|
12
13
|
* [Pubannos JSON](http://pubannotation.dbcls.jp/)
|
13
14
|
* [PDFx XML](http://pdfx.cs.man.ac.uk/)
|
14
15
|
|
@@ -18,6 +19,8 @@ Supported RDF output formats:
|
|
18
19
|
|
19
20
|
Ontologies used in the RDF output:
|
20
21
|
|
22
|
+
* [Comparative Data Analysis Ontology](http://sourceforge.net/apps/mediawiki/cdao/index.php?title=Main_Page) (CDAO)
|
23
|
+
* [Friend of a Friend](http://xmlns.com/foaf/spec) (FOAF)
|
21
24
|
* [Generic Feature Format Version 3 Ontology](http://www.biointerchange.org/ontologies.html) (GFF3O)
|
22
25
|
* [Genome Variation Format Version 1 Ontology](http://www.biointerchange.org/ontologies.html) (GVF1O)
|
23
26
|
* [Semanticscience Integrated Ontology](http://code.google.com/p/semanticscience/wiki/SIO) (SIO)
|
@@ -48,14 +51,17 @@ BioInterchange's command-line tool `biointerchange` can be installed as a comman
|
|
48
51
|
|
49
52
|
Examples:
|
50
53
|
|
51
|
-
biointerchange --input
|
52
|
-
biointerchange --input
|
54
|
+
biointerchange --input biointerchange.gvf --rdf rdf.biointerchange.gvf --batchsize 100 --file examples/estd176_Banerjee_et_al_2011.2012-11-29.NCBI36.gvf
|
55
|
+
biointerchange --input dbcls.catanns.json --rdf rdf.bh12.sio --file examples/pubannotation.10096561.json --annotate_name 'Peter Smith' --annotate_name_id 'peter.smith@example.com'
|
56
|
+
biointerchange --input uk.ac.man.pdfx --rdf rdf.bh12.sio --file examples/gb-2007-8-3-R40.xml --annotate_name 'Peter Smith' --annotate_name_id 'peter.smith@example.com'
|
57
|
+
biointerchange --input phylotastic.newick --rdf rdf.phylotastic.newick --file examples/tree2.new --annotate_date '1 June 2006'
|
53
58
|
|
54
59
|
Input formats:
|
55
60
|
|
56
61
|
* `biointerchange.gff3`
|
57
62
|
* `biointerchange.gvf`
|
58
63
|
* `dbcls.catanns.json`
|
64
|
+
* `phylotastic.newick`
|
59
65
|
* `uk.ac.man.pdfx`
|
60
66
|
|
61
67
|
Output formats:
|
@@ -63,6 +69,7 @@ Output formats:
|
|
63
69
|
* `rdf.biointerchange.gff3`
|
64
70
|
* `rdf.biointerchange.gvf`
|
65
71
|
* `rdf.bh12.sio`
|
72
|
+
* `rdf.phylotastic.newick`
|
66
73
|
|
67
74
|
#### Using a Triple Store
|
68
75
|
|
@@ -79,7 +86,7 @@ RDF data produced by BioInterchange can be directly loaded into a triple store.
|
|
79
86
|
testrepo> load <path-to-your-rdf-data> .
|
80
87
|
testrepo> sparql select * where { ?s ?p ?o } .
|
81
88
|
|
82
|
-
To list all `seqid` entries from a
|
89
|
+
To list all `seqid` entries from a GVF-file conversion in the store, the following SPARQL query can be used:
|
83
90
|
|
84
91
|
testrepo> sparql select * where { ?s <http://www.biointerchange.org/gvf1o#GVF1_0004> ?o } .
|
85
92
|
|
@@ -96,6 +103,7 @@ Another approach is to load the data and its related GFF3O/GVF1O ontology into [
|
|
96
103
|
|
97
104
|
The following list provides information on the origin of the example-data files in the `examples` directory:
|
98
105
|
|
106
|
+
* `bininda_emonds_mammals.new`: Newick formatted Bininda-Emonds mammals tree (see [The delayed rise of present-day mammals](http://www.ncbi.nlm.nih.gov/pubmed/17392779)). Downloaded from [https://github.com/bendmorris/rdf-treestore/blob/master/trees/bininda_emonds_mammals.new](https://github.com/bendmorris/rdf-treestore/blob/master/trees/bininda_emonds_mammals.new)
|
99
107
|
* `BovineGenomeChrX.gff3.gz`: Gzipped GFF3 file describing a Bos taurus chromosome X. Downloaded from [http://bovinegenome.org/?q=download_chromosome_gff3](http://bovinegenome.org/?q=download_chromosome_gff3)
|
100
108
|
* `chromosome_BF.gff`: GFF3 file of floating contigs from the Baylor Sequencing Centre. Downloaded from [http://dictybase.org/Downloads](http://dictybase.org/Downloads)
|
101
109
|
* `estd176_Banerjee_et_al_2011.2012-11-29.NCBI36.gvf`: GVF file of EBI's [DGVa](http://www.ebi.ac.uk/dgva/database-genomic-variants-archive). Downloaded from [ftp://ftp.ebi.ac.uk/pub/databases/dgva/estd176_Banerjee_et_al_2011/gvf/estd176_Banerjee_et_al_2011.2012-11-29.NCBI36.gvf](ftp://ftp.ebi.ac.uk/pub/databases/dgva/estd176_Banerjee_et_al_2011/gvf/estd176_Banerjee_et_al_2011.2012-11-29.NCBI36.gvf)
|
@@ -106,13 +114,220 @@ The following list provides information on the origin of the example-data files
|
|
106
114
|
|
107
115
|
#### Ruby
|
108
116
|
|
109
|
-
|
117
|
+
BioInterchange is available as Ruby gem that can be installed as follows:
|
110
118
|
|
111
|
-
gem install biointerchange
|
119
|
+
sudo gem install biointerchange
|
120
|
+
|
121
|
+
The API provides vocabulary wrappers to ontologies that are used within the BioInterchange framework as well as access to RDFization implementations.
|
122
|
+
|
123
|
+
##### Using Vocabulary Wrappers
|
112
124
|
|
113
|
-
|
125
|
+
Ruby classes are provided for the ontologies that is used for serializing RDF. Each ontology is represented by its own Ruby class. The classes provide access to the ontology terms and additional methods for resolving OWL classes, datatype properties and object properties.
|
114
126
|
|
127
|
+
Usage example (see also [vocabulary.rb](https://github.com/BioInterchange/BioInterchange/blob/master/examples/vocabulary.rb)):
|
128
|
+
|
129
|
+
require 'rubygems'
|
115
130
|
require 'biointerchange'
|
131
|
+
|
132
|
+
include BioInterchange
|
133
|
+
|
134
|
+
def print_resource(resource)
|
135
|
+
puts " #{resource}"
|
136
|
+
puts " Ontology class: #{GFF3O.is_class?(resource)}"
|
137
|
+
puts " Ontology object property: #{GFF3O.is_object_property?(resource)}"
|
138
|
+
puts " Ontology datatype property: #{GFF3O.is_datatype_property?(resource)}"
|
139
|
+
end
|
140
|
+
|
141
|
+
# Get the URI of an ontology term by label:
|
142
|
+
puts "'seqid' property:"
|
143
|
+
print_resource(GFF3O.seqid())
|
144
|
+
|
145
|
+
# Ambiguous labels will return an array of URIs:
|
146
|
+
# "start" can refer to a sub-property of "feature_properties" or "target_properties"
|
147
|
+
puts "'start' properties:"
|
148
|
+
GFF3O.start().each { |start_synonym|
|
149
|
+
print_resource(start_synonym)
|
150
|
+
}
|
151
|
+
# "feature_properties" can be either a datatype or object property
|
152
|
+
puts "'feature_properties' properties:"
|
153
|
+
GFF3O.feature_properties().each { |feature_properties_synonym|
|
154
|
+
print_resource(feature_properties_synonym)
|
155
|
+
}
|
156
|
+
|
157
|
+
# Use build-in method "is_datatype_property" to resolve ambiguity:
|
158
|
+
# (Note: there is exactly one item in the result set, so the selection of the first item is acceptable.)
|
159
|
+
feature_properties = GFF3O.feature_properties().select { |uri| GFF3O.is_datatype_property?(uri) }
|
160
|
+
puts "'feature_properties' properties, which are a datatype property:"
|
161
|
+
feature_properties.each { |feature_property|
|
162
|
+
print_resource(feature_property)
|
163
|
+
}
|
164
|
+
|
165
|
+
# Use build-in method "with_parent" to pick properties based on their context:
|
166
|
+
puts "'start' property with parent datatype property 'feature_properties':"
|
167
|
+
GFF3O.with_parent(GFF3O.start(), feature_properties[0]).each { |feature_property|
|
168
|
+
print_resource(feature_property)
|
169
|
+
}
|
170
|
+
|
171
|
+
With the BioInterchange gem installed, the example can be executed on the command line via:
|
172
|
+
|
173
|
+
git clone git://github.com/BioInterchange/BioInterchange.git
|
174
|
+
cd BioInterchange
|
175
|
+
git checkout v1.0.0
|
176
|
+
ruby examples/vocabulary.rb
|
177
|
+
|
178
|
+
##### RDFization Framework
|
179
|
+
|
180
|
+
Usage example (see also [rdfization.rb](https://github.com/BioInterchange/BioInterchange/blob/master/examples/rdfization.rb)):
|
181
|
+
|
182
|
+
require 'rubygems'
|
183
|
+
require 'biointerchange'
|
184
|
+
|
185
|
+
include BioInterchange::Phylogenetics
|
186
|
+
|
187
|
+
# Create a reader that reads phylogenetic trees in Newick format:
|
188
|
+
reader = NewickReader.new()
|
189
|
+
|
190
|
+
# Create a model from a single example tree:
|
191
|
+
# (Note: the `deserialize` method also takes streams as parameter -- not just strings.)
|
192
|
+
model = reader.deserialize('((B:0.2,(C:0.3,D:0.4)E:0.5)F:0.1)A;')
|
193
|
+
|
194
|
+
# Serialize the model as RDF N-Triples to STDOUT:
|
195
|
+
CDAORDFWriter.new(STDOUT).serialize(model)
|
196
|
+
|
197
|
+
##### Implementing New Readers, Models and Writers
|
198
|
+
|
199
|
+
New readers, models and writers are best adopted from or build upon the existing implementations. The phylogenetic trinity of Newick file format reader, [BioRuby](http://bioruby.org) based tree model, and [CDAO](http://sourceforge.net/apps/mediawiki/cdao/index.php?title=Main_Page) RDF writer is used here as a guidline due to its simplicity.
|
200
|
+
|
201
|
+
###### Reader: Creating an Object Model
|
202
|
+
|
203
|
+
The quintessential Newick tree reader is depicted below. Its class is placed in a Ruby module that encapsulates all phylogenetic related source code. The `NewickReader` class inherits from the BioInterchange framework class `Reader` that provides method stubs which need to be overwritten. Using the central registry `BioInterchange::Registry`, the reader informs the framework of its: unique identifier (`phylotastic.newick`), Ruby class (`NewickReader`), command line parameters that it accepts (`date`, which becomes `--annotate_date`), whether the reader can operate without reading the complete input all at once (`true`), a descriptive name of the reader (`Newick Tree [...]`), and an array with descriptions for each parameter stated above.
|
204
|
+
|
205
|
+
Deserialization of Newick trees is done using the `deserialize` method, which must take both strings and input streams as valid arguments. If this contraint is not satisfied, then an `ImplementationReaderError` is thrown that is caught by the framework and handled appropriately.
|
206
|
+
|
207
|
+
Finally, the `postponed?` method keeps track of deferred input processing. If the batch size was reached and the model was passed on for serialization to a writer, then this method will have to return `true`.
|
208
|
+
|
209
|
+
require 'bio'
|
210
|
+
require 'date'
|
211
|
+
|
212
|
+
module BioInterchange::Phylogenetics
|
213
|
+
|
214
|
+
class NewickReader < BioInterchange::Reader
|
215
|
+
|
216
|
+
# Register reader:
|
217
|
+
BioInterchange::Registry.register_reader(
|
218
|
+
'phylotastic.newick',
|
219
|
+
NewickReader,
|
220
|
+
[ 'date' ],
|
221
|
+
true,
|
222
|
+
'Newick Tree File Format reader',
|
223
|
+
[
|
224
|
+
[ 'date <date>', 'date when the Newick file was created (optional)' ]
|
225
|
+
]
|
226
|
+
)
|
227
|
+
|
228
|
+
# Creates a new instance of a Newick file format reader.
|
229
|
+
#
|
230
|
+
# The reader supports batch processing.
|
231
|
+
#
|
232
|
+
# +date+:: Optional date of when the Newick file was produced, annotated, etc.
|
233
|
+
# +batch_size+:: Optional integer that determines that number of features that
|
234
|
+
# should be processed in one go.
|
235
|
+
def initialize(date = nil, batch_size = nil)
|
236
|
+
@date = date
|
237
|
+
@batch_size = batch_size
|
238
|
+
end
|
239
|
+
|
240
|
+
# Reads a Newick file from the input stream and returns an associated model.
|
241
|
+
#
|
242
|
+
# If this method is called when +postponed?+ returns true, then the reading will
|
243
|
+
# continue from where it has been interrupted beforehand.
|
244
|
+
#
|
245
|
+
# +inputstream+:: an instance of class IO or String that holds the contents of a Newick file
|
246
|
+
def deserialize(inputstream)
|
247
|
+
if inputstream.kind_of?(IO)
|
248
|
+
create_model(inputstream)
|
249
|
+
elsif inputstream.kind_of?(String) then
|
250
|
+
create_model(StringIO.new(inputstream))
|
251
|
+
else
|
252
|
+
raise BioInterchange::Exceptions::ImplementationReaderError, 'The provided input stream needs to be either of type IO or String.'
|
253
|
+
end
|
254
|
+
end
|
255
|
+
|
256
|
+
# Returns true if the reading of the input was postponed due to a full batch.
|
257
|
+
def postponed?
|
258
|
+
@postponed
|
259
|
+
end
|
260
|
+
|
261
|
+
protected
|
262
|
+
|
263
|
+
# ...concrete implementation omitted.
|
264
|
+
|
265
|
+
###### Tree Model
|
266
|
+
|
267
|
+
A model is created by a reader and it is subsequently consumed by a writer. The phylogenetic tree model inherits `BioInterchange::Model` which defines the `prune` method. If batch operation is in place, i.e. the input is not completely read into memory, then the `prune` method will be called to instruct the model to drop all information that has not to be kept in memory anymore. In a sense, this can be seen as a form of garbage collection, where data that has been serialized is purged from memory.
|
268
|
+
|
269
|
+
module BioInterchange::Phylogenetics
|
270
|
+
|
271
|
+
# A phylogenetic tree set that can contain multiple phylogenetic trees.
|
272
|
+
class TreeSet < BioInterchange::Model
|
273
|
+
|
274
|
+
# Create a new instance of a tree set. A tree set can contain multiple phylogenetic trees.
|
275
|
+
def initialize
|
276
|
+
# Trees are stored as the keys of a hash map to increase performance:
|
277
|
+
@set = {}
|
278
|
+
end
|
279
|
+
|
280
|
+
# ...omitted internal data structure handling.
|
281
|
+
|
282
|
+
# Removes all features from the set, but keeps additional data (e.g., the date).
|
283
|
+
def prune
|
284
|
+
@set.clear
|
285
|
+
end
|
286
|
+
|
287
|
+
end
|
288
|
+
|
289
|
+
end
|
290
|
+
|
291
|
+
###### Writer: From Object Model to RDF
|
292
|
+
|
293
|
+
The writer takes an object model and serializes it via the `BioInterchange::Writer` derived `serialize` method. A writer uses `BioInterchange::Registry` to make itself known to the BioInterchange framework, where it signs up using the following arguments: a unique identifier (`rdf.phylotastic.newick`), its implementing class (`CDAORDFWriter`), a list of readers that it is compatible with (`phylotastic.newick`), whether the writer supports batch processing where only parts of the input need to be kept in memory (`true`), and a descriptive name for the writer.
|
294
|
+
|
295
|
+
require 'rdf'
|
296
|
+
require 'rdf/ntriples'
|
297
|
+
|
298
|
+
module BioInterchange::Phylogenetics
|
299
|
+
|
300
|
+
# Serialized phylogenetic tree models based on BioRuby's phylogenetic tree implementation.
|
301
|
+
class CDAORDFWriter < BioInterchange::Writer
|
302
|
+
|
303
|
+
# Register writers:
|
304
|
+
BioInterchange::Registry.register_writer(
|
305
|
+
'rdf.phylotastic.newick',
|
306
|
+
CDAORDFWriter,
|
307
|
+
[ 'phylotastic.newick' ],
|
308
|
+
true,
|
309
|
+
'Comparative Data Analysis Ontology (CDAO) based RDFization'
|
310
|
+
)
|
311
|
+
|
312
|
+
# Creates a new instance of a CDAORDFWriter that will use the provided output stream to serialize RDF.
|
313
|
+
#
|
314
|
+
# +ostream+:: instance of an IO class or derivative that is used for RDF serialization
|
315
|
+
def initialize(ostream)
|
316
|
+
@ostream = ostream
|
317
|
+
end
|
318
|
+
|
319
|
+
# Serialize a model as RDF.
|
320
|
+
#
|
321
|
+
# +model+:: a generic representation of input data that is an instance of BioInterchange::Phylogenetics::TreeSet
|
322
|
+
def serialize(model)
|
323
|
+
model.contents.each { |tree|
|
324
|
+
serialize_model(model, tree)
|
325
|
+
}
|
326
|
+
end
|
327
|
+
|
328
|
+
protected
|
329
|
+
|
330
|
+
# ...omitted actual serialization implementation.
|
116
331
|
|
117
332
|
#### Python
|
118
333
|
|
@@ -122,28 +337,54 @@ BioInterchange available.
|
|
122
337
|
To install the BioInterchange egg, run:
|
123
338
|
|
124
339
|
sudo easy_install rdflib
|
125
|
-
sudo easy_install http://www.biointerchange.org/eggs/biointerchange-0.
|
340
|
+
sudo easy_install http://www.biointerchange.org/eggs/biointerchange-1.0.0-py2.7.egg
|
126
341
|
|
127
342
|
Usage examples:
|
128
343
|
|
129
344
|
import biointerchange
|
130
345
|
from biointerchange import *
|
346
|
+
from rdflib.namespace import Namespace
|
347
|
+
|
348
|
+
def print_resource(resource):
|
349
|
+
print " " + resource
|
350
|
+
print " Ontology class: " + str(GFF3O.is_class(resource))
|
351
|
+
print " Ontology object property: " + str(GFF3O.is_object_property(resource))
|
352
|
+
print " Ontology datatype property: " + str(GFF3O.is_datatype_property(resource))
|
131
353
|
|
132
354
|
# Get the URI of an ontology term by label:
|
133
|
-
|
355
|
+
print "'seqid' property:"
|
356
|
+
print_resource(GFF3O.seqid())
|
134
357
|
|
135
358
|
# Ambiguous labels will return an array of URIs:
|
136
359
|
# "start" can refer to a sub-property of "feature_properties" or "target_properties"
|
137
|
-
|
360
|
+
print "'start' properties:"
|
361
|
+
for start_synonym in GFF3O.start():
|
362
|
+
print_resource(start_synonym)
|
363
|
+
|
138
364
|
# "feature_properties" can be either a datatype or object property
|
139
|
-
|
365
|
+
print "'feature_properties' properties:"
|
366
|
+
for feature_properties_synonym in GFF3O.feature_properties():
|
367
|
+
print_resource(feature_properties_synonym)
|
140
368
|
|
141
369
|
# Use build-in method "is_datatype_property" to resolve ambiguity:
|
142
370
|
# (Note: there is exactly one item in the result set, so the selection of the first item is acceptable.)
|
143
|
-
feature_properties = filter(lambda uri: GFF3O.is_datatype_property(uri), GFF3O.feature_properties())
|
371
|
+
feature_properties = filter(lambda uri: GFF3O.is_datatype_property(uri), GFF3O.feature_properties())
|
372
|
+
print "'feature_properties' properties, which are a datatype property:"
|
373
|
+
for feature_property in feature_properties:
|
374
|
+
print_resource(feature_property)
|
144
375
|
|
145
376
|
# Use build-in method "with_parent" to pick properties based on their context:
|
146
|
-
|
377
|
+
print "'start' property with parent datatype property 'feature_properties':"
|
378
|
+
for feature_property in GFF3O.with_parent(GFF3O.start(), feature_properties[0]):
|
379
|
+
print_resource(feature_property)
|
380
|
+
|
381
|
+
The example can be executed on the command line via:
|
382
|
+
|
383
|
+
git clone git://github.com/BioInterchange/BioInterchange.git
|
384
|
+
cd BioInterchange
|
385
|
+
git checkout v1.0.0
|
386
|
+
cd supplemental/python
|
387
|
+
python example.py
|
147
388
|
|
148
389
|
#### Java
|
149
390
|
|
@@ -163,15 +404,10 @@ To use the BioInterchange artifact, set-up add the following to your Maven POM f
|
|
163
404
|
<dependency>
|
164
405
|
<groupId>org.biointerchange</groupId>
|
165
406
|
<artifactId>vocabularies</artifactId>
|
166
|
-
<version>0.
|
407
|
+
<version>1.0.0</version>
|
167
408
|
</dependency>
|
168
409
|
</dependencies>
|
169
410
|
|
170
|
-
Current vocabularies:
|
171
|
-
|
172
|
-
* Generic Feature Format Version 3 Ontology (GFF3O)
|
173
|
-
* Genome Variation Format Version 1 Ontology (GVF1O)
|
174
|
-
|
175
411
|
Usage examples of accessing GFF3O's vocabulary:
|
176
412
|
|
177
413
|
package org.biointerchange;
|
@@ -233,6 +469,14 @@ Usage examples of accessing GFF3O's vocabulary:
|
|
233
469
|
}
|
234
470
|
}
|
235
471
|
|
472
|
+
Another example that uses SIO instead of GFF3O is provided as [AppSIO.java](https://github.com/BioInterchange/BioInterchange/blob/master/supplemental/java/biointerchange/src/main/java/org/biointerchange/AppSIO.java).
|
473
|
+
|
474
|
+
The examples can be executed through Maven:
|
475
|
+
|
476
|
+
cd supplemental/java/biointerchange
|
477
|
+
mvn exec:java -Dexec.mainClass="org.biointerchange.App"
|
478
|
+
mvn exec:java -Dexec.mainClass="org.biointerchange.AppSIO"
|
479
|
+
|
236
480
|
### RESTful Web-Service
|
237
481
|
|
238
482
|
A RESTful web-service is available via the URI: [http://www.biointerchange.org/service/rdfizer.biocgi](http://www.biointerchange.org/service/rdfizer.biocgi)
|
@@ -251,11 +495,13 @@ RDFization parameters and data are send as a single HTTP POST requests containin
|
|
251
495
|
* `biointerchange.gff3`: [Generic Feature Format Version 3](http://www.sequenceontology.org/resources/gff3.html)
|
252
496
|
* `biointerchange.gvf`: [Genome Variation Format](http://www.sequenceontology.org/resources/gvf.html)
|
253
497
|
* `dbcls.catanns.json`: [PubAnnotation categorical annotations](http://pubannotation.dbcls.jp) JSON
|
498
|
+
* `phylotastic.newick`: [Newick](http://evolution.genetics.washington.edu/phylip/newicktree.html)
|
254
499
|
* `uk.ac.man.pdfx`: [PDFx](http://pdfx.cs.man.ac.uk) XML
|
255
500
|
* `OUTPUT_METHOD`: determines the RDFization method that should be used, output will always be RDF N-Triples; available output formats are
|
256
501
|
* `rdf.biointerchange.gff3`: RDFization of `biointerchange.gff3`
|
257
502
|
* `rdf.biointerchange.gvf`: RDFization of `biointerchange.gvf`
|
258
503
|
* `rdf.bh12.sio`: RDFization of `dbcls.catanns.json` or `uk.ac.man.pdfx`
|
504
|
+
* `rdf.phylotastic.newick`: RDFization of `phylotastic.newick`
|
259
505
|
* `URL_ENCODED_DATA`: data for RDFization as [URL encoded](http://en.wikipedia.org/wiki/Percent-encoding) string
|
260
506
|
|
261
507
|
#### Example
|
@@ -310,10 +556,13 @@ The last step, `bundle`, will install gem dependencies of BioInterchange automat
|
|
310
556
|
|
311
557
|
### Building Vocabulary Classes
|
312
558
|
|
313
|
-
Building a new version of the Ruby vocabulary classes for FALDO, GFF3O, GVF1O, SIO, SOFA (requires that the OBO files are saves as RDF/XML using [Protege](http://protege.stanford.edu); Apache [Jena](http://jena.apache.org)'s `rdfcat` tool is required to reformat RDF Turtle as RDF/XML):
|
559
|
+
Building a new version of the Ruby vocabulary classes for CDAO, FALDO, GFF3O, GVF1O, SIO, SOFA (requires that the OBO files are saves as RDF/XML using [Protege](http://protege.stanford.edu); Apache [Jena](http://jena.apache.org)'s `rdfcat` tool is required to reformat RDF Turtle as RDF/XML):
|
314
560
|
|
315
561
|
sudo gem install rdf
|
316
562
|
sudo gem install rdf-rdfxml
|
563
|
+
echo -e "require 'rdf'\nmodule BioInterchange\n" > lib/biointerchange/cdao.rb
|
564
|
+
ruby generators/rdfxml.rb <path-to-rdf/xml-version-of-cdao> CDAO >> lib/biointerchange/cdao.rb
|
565
|
+
echo -e "\nend" >> lib/biointerchange/cdao.rb
|
317
566
|
echo -e "require 'rdf'\nmodule BioInterchange\n" > lib/biointerchange/faldo.rb
|
318
567
|
rdfcat -ttl <path-to-turtle-version-of-faldo> > faldo.xml.tmp
|
319
568
|
ruby generators/rdfxml.rb faldo.xml.tmp FALDO >> lib/biointerchange/faldo.rb
|
@@ -438,6 +687,7 @@ In alphabetical order of the last name:
|
|
438
687
|
* [Kevin B. Cohen](http://compbio.ucdenver.edu/Hunter_lab/Cohen/index.shtml)
|
439
688
|
* [Geraint Duck](http://www.cs.man.ac.uk/~duckg)
|
440
689
|
* [Michel Dumontier](http://dumontierlab.com)
|
690
|
+
* [Begum Durgahee](http://utah.academia.edu/BDurgahee)
|
441
691
|
* [Jin-Dong Kim](http://www.bioontology.org/Jin-Dong_Kim)
|
442
692
|
|
443
693
|
Cite
|
data/VERSION
CHANGED
@@ -1 +1 @@
|
|
1
|
-
0.
|
1
|
+
1.0.0
|