om 1.2.5 → 1.3.0
Sign up to get free protection for your applications and to get access to all the features.
- data/COMMON_OM_PATTERNS.textile +127 -0
- data/GETTING_FANCY.textile +15 -13
- data/GETTING_STARTED.textile +89 -51
- data/QUERYING_DOCUMENTS.textile +30 -25
- data/README.rdoc +5 -1
- data/README.textile +4 -0
- data/lib/om/samples/mods_article.rb +7 -4
- data/lib/om/version.rb +1 -1
- data/lib/om/xml/document.rb +10 -2
- data/lib/om/xml/named_term_proxy.rb +7 -2
- data/lib/om/xml/term.rb +59 -9
- data/lib/om/xml/term_value_operators.rb +3 -0
- data/lib/om/xml/terminology.rb +29 -0
- data/lib/om/xml.rb +3 -0
- data/spec/fixtures/mods_article_terminology.xml +2882 -0
- data/spec/unit/document_spec.rb +12 -0
- data/spec/unit/named_term_proxy_spec.rb +22 -9
- data/spec/unit/term_builder_spec.rb +1 -1
- data/spec/unit/term_spec.rb +20 -3
- data/spec/unit/term_value_operators_spec.rb +24 -0
- data/spec/unit/terminology_spec.rb +5 -5
- data/spec/unit/xml_serialization_spec.rb +63 -0
- metadata +12 -8
- data/VERSION +0 -1
data/COMMON_OM_PATTERNS.textile
CHANGED
@@ -2,6 +2,133 @@ h1. Common Patterns You'll Use with OM
|
|
2
2
|
|
3
3
|
h2. Common Terminology Patterns
|
4
4
|
|
5
|
+
h4. element value
|
6
|
+
|
7
|
+
We want an OM term to be assigned so its value is the value of an element.
|
8
|
+
|
9
|
+
Given this xml:
|
10
|
+
|
11
|
+
<pre>
|
12
|
+
<outer>
|
13
|
+
<element>my value</element>
|
14
|
+
</outer>
|
15
|
+
</pre>
|
16
|
+
|
17
|
+
we want to have an OM term named "element" with "my value" as the value.
|
18
|
+
|
19
|
+
In the Datastream Model:
|
20
|
+
|
21
|
+
<pre>
|
22
|
+
# defines the expected OM terminology for example xml
|
23
|
+
class ExampleXMLDS < ActiveFedora::NokogiriDatastream
|
24
|
+
# OM (Opinionated Metadata) terminology mapping
|
25
|
+
set_terminology do |t|
|
26
|
+
t.root(:path => "outer", :xmlns => '')
|
27
|
+
t.element
|
28
|
+
end
|
29
|
+
end # class
|
30
|
+
</pre>
|
31
|
+
|
32
|
+
Q: What if we don't want the OM term to be named "element," but "giraffe"?
|
33
|
+
|
34
|
+
A:
|
35
|
+
|
36
|
+
<pre>
|
37
|
+
class ExampleXMLDS < ActiveFedora::NokogiriDatastream
|
38
|
+
set_terminology do |t|
|
39
|
+
t.root(:path => "outer", :xmlns => '')
|
40
|
+
t.giraffe(:path => "element")
|
41
|
+
end
|
42
|
+
end # class
|
43
|
+
</pre>
|
44
|
+
|
45
|
+
h4. element value given a specific attribute value
|
46
|
+
|
47
|
+
We want an OM term to be assigned to an element's value, but only when the element has a specific attribute value.
|
48
|
+
|
49
|
+
Given this xml:
|
50
|
+
|
51
|
+
<pre>
|
52
|
+
</outer>
|
53
|
+
<element my_attr="attr value">element value</element>
|
54
|
+
</outer>
|
55
|
+
</pre>
|
56
|
+
|
57
|
+
we want to have an OM term named "element" with "element value" as the value.
|
58
|
+
|
59
|
+
In the Datastream Model:
|
60
|
+
|
61
|
+
<pre>
|
62
|
+
# defines the expected OM terminology for example xml
|
63
|
+
class ExampleXMLDS < ActiveFedora::NokogiriDatastream
|
64
|
+
# OM (Opinionated Metadata) terminology mapping
|
65
|
+
set_terminology do |t|
|
66
|
+
t.root(:path => "outer", :xmlns => '')
|
67
|
+
t.element(:attributes=>{:my_attr=>"attr value"})
|
68
|
+
end
|
69
|
+
end # class
|
70
|
+
</pre>
|
71
|
+
|
72
|
+
Q: What if we don't want the OM term to be named "element," but "gazelle"?
|
73
|
+
|
74
|
+
A:
|
75
|
+
|
76
|
+
<pre>
|
77
|
+
class ExampleXMLDS < ActiveFedora::NokogiriDatastream
|
78
|
+
set_terminology do |t|
|
79
|
+
t.root(:path => "outer", :xmlns => '')
|
80
|
+
t.gazelle(:path => "element", :attributes=>{:my_attr=>"attr value"})
|
81
|
+
end
|
82
|
+
end
|
83
|
+
</pre>
|
84
|
+
|
85
|
+
h4. attribute value
|
86
|
+
|
87
|
+
We want an OM term to be assigned to an element's attribute value.
|
88
|
+
|
89
|
+
Given this xml:
|
90
|
+
|
91
|
+
<pre>
|
92
|
+
</outer>
|
93
|
+
<element my_attr="attr value">element value</element>
|
94
|
+
</outer>
|
95
|
+
</pre>
|
96
|
+
|
97
|
+
we want to have an OM term named "my_attr" with "attr value" as the value.
|
98
|
+
|
99
|
+
In the Datastream Model:
|
100
|
+
|
101
|
+
FIXME: is this correct? (Naomi wonders 2011-06-22)
|
102
|
+
|
103
|
+
<pre>
|
104
|
+
# defines the expected OM terminology for example xml
|
105
|
+
class ExampleXMLDS < ActiveFedora::NokogiriDatastream
|
106
|
+
# OM (Opinionated Metadata) terminology mapping
|
107
|
+
set_terminology do |t|
|
108
|
+
t.root(:path => "outer", :xmlns => '')
|
109
|
+
t.element {
|
110
|
+
t.my_attr(:path => {:attribute=>"my_attr"})
|
111
|
+
}
|
112
|
+
end
|
113
|
+
end # class
|
114
|
+
</pre>
|
115
|
+
|
116
|
+
Q: What if we don't want the OM term to be named "my_attr," but "hippo"?
|
117
|
+
|
118
|
+
A:
|
119
|
+
|
120
|
+
<pre>
|
121
|
+
class ExampleXMLDS < ActiveFedora::NokogiriDatastream
|
122
|
+
set_terminology do |t|
|
123
|
+
t.root(:path => "outer", :xmlns => '')
|
124
|
+
t.element {
|
125
|
+
t.hippo(:path => {:attribute=>"my_attr"})
|
126
|
+
}
|
127
|
+
end
|
128
|
+
end
|
129
|
+
</pre>
|
130
|
+
|
131
|
+
|
5
132
|
h3. Reserved method names (ie. id_, root_)
|
6
133
|
|
7
134
|
Like Nokogiri ...
|
data/GETTING_FANCY.textile
CHANGED
@@ -4,9 +4,11 @@ h1. Getting Fancy
|
|
4
4
|
|
5
5
|
h2. Alternative ways to Manipulate Terms, Terminologies and their Builders
|
6
6
|
|
7
|
-
|
7
|
+
There is more than one way to build a terminology.
|
8
8
|
|
9
|
-
|
9
|
+
h3. OM::XML::Terminology::Builder":OM/XML/Terminology/Builder.html Block Syntax
|
10
|
+
|
11
|
+
The simplest way to create an OM Terminology is to use the "OM::XML::Terminology::Builder":OM/XML/Terminology/Builder.html" block syntax.
|
10
12
|
|
11
13
|
In the following examples, we will show different ways of building this Terminology:
|
12
14
|
|
@@ -24,21 +26,21 @@ In the following examples, we will show different ways of building this Terminol
|
|
24
26
|
</pre>
|
25
27
|
|
26
28
|
|
27
|
-
h3. Using Term::Builders
|
29
|
+
h3. Using "OM::XML::Term":OM/XML/Term.html ::Builders
|
28
30
|
|
29
|
-
First, create the Terminology Builder object
|
31
|
+
First, create the Terminology Builder object:
|
30
32
|
|
31
33
|
<pre>
|
32
34
|
terminology_builder = OM::XML::Terminology::Builder.new
|
33
35
|
</pre>
|
34
36
|
|
35
|
-
The .root method handles creating the root term and setting namespaces, schema, etc on the Terminology
|
37
|
+
The .root method handles creating the root term and setting namespaces, schema, etc. on the Terminology:
|
36
38
|
|
37
39
|
<pre>
|
38
40
|
terminology_builder.root(:path=>"grants", :xmlns=>"http://yourmediashelf.com/schemas/hydra-dataset/v0", :schema=>"http://example.org/schemas/grants-v1.xsd")
|
39
41
|
</pre>
|
40
42
|
|
41
|
-
|
43
|
+
This sets the namespaces for you and created the "grants" root term:
|
42
44
|
|
43
45
|
<pre>
|
44
46
|
terminology_builder.namespaces
|
@@ -46,7 +48,7 @@ As you can see, this sets the namespaces for you and created the "grants" root t
|
|
46
48
|
terminology_builder.term_builders
|
47
49
|
</pre>
|
48
50
|
|
49
|
-
Create Term Builders for each of the Terms
|
51
|
+
Create Term Builders for each of the Terms:
|
50
52
|
|
51
53
|
<pre>
|
52
54
|
term1_builder = OM::XML::Term::Builder.new("grant", terminology_builder).path("grant")
|
@@ -55,7 +57,7 @@ Create Term Builders for each of the Terms
|
|
55
57
|
subterm2_builder = OM::XML::Term::Builder.new("number", terminology_builder)
|
56
58
|
</pre>
|
57
59
|
|
58
|
-
Assemble the tree of Term builders by adding child builders to their parents
|
60
|
+
Assemble the tree of Term builders by adding child builders to their parents; then add the top level terms to the root term in the Terminology builder:
|
59
61
|
|
60
62
|
<pre>
|
61
63
|
subterm1_builder.add_child(subsubterm_builder)
|
@@ -64,7 +66,7 @@ Assemble the tree of Term builders by adding child builders to their parents, th
|
|
64
66
|
terminology_builder.term_builders["grant"] = term1_builder
|
65
67
|
</pre>
|
66
68
|
|
67
|
-
Now build the Terminology, which will also call .build on each of the Term Builders
|
69
|
+
Now build the Terminology, which will also call .build on each of the Term Builders in the tree:
|
68
70
|
|
69
71
|
<pre>
|
70
72
|
built_terminology = terminology_builder.build
|
@@ -86,8 +88,8 @@ If you want to manipulate Terms and Terminologies directly rather than using the
|
|
86
88
|
|
87
89
|
People don't often do this, but the option is there if you need it.
|
88
90
|
|
89
|
-
Create the Terminology, set its namespaces & (optional) schema
|
90
|
-
Note that you have to set the :oxns namespaces to match :xmlns. This is usually done for you by the Terminology::Builder.root method.
|
91
|
+
Create the Terminology, set its namespaces & (optional) schema:
|
92
|
+
(Note that you have to set the :oxns namespaces to match :xmlns. This is usually done for you by the Terminology::Builder.root method.)
|
91
93
|
|
92
94
|
<pre>
|
93
95
|
handcrafted_terminology = OM::XML::Terminology.new
|
@@ -96,7 +98,7 @@ Note that you have to set the :oxns namespaces to match :xmlns. This is usually
|
|
96
98
|
handcrafted_terminology.schema = "http://example.org/schemas/grants-v1.xsd"
|
97
99
|
</pre>
|
98
100
|
|
99
|
-
Create the Terms
|
101
|
+
Create the Terms:
|
100
102
|
|
101
103
|
<pre>
|
102
104
|
# Create term1 (the root) and set it as the root term
|
@@ -126,7 +128,7 @@ Assemble the tree of terms by adding child terms to their parents, then add thos
|
|
126
128
|
handcrafted_terminology.add_term(term1)
|
127
129
|
</pre>
|
128
130
|
|
129
|
-
Generate the xpath queries for each term. This is usually done for you by the Term Builder.build method
|
131
|
+
Generate the xpath queries for each term. This is usually done for you by the Term Builder.build method:
|
130
132
|
|
131
133
|
<pre>
|
132
134
|
[root_term, term1, subterm1, subsubterm, subterm2].each {|t| t.generate_xpath_queries!}
|
data/GETTING_STARTED.textile
CHANGED
@@ -1,3 +1,25 @@
|
|
1
|
+
h2. OM (Opinionated Metadata) - Getting Started
|
2
|
+
|
3
|
+
OM allows you to define a "terminology" to ease translation between XML and ruby objects - you can query the xml for Nodes _or_ node values without ever writing a line of XPath.
|
4
|
+
|
5
|
+
OM "terms" are ruby symbols you define (in the terminology) that map specific XML content into ruby object attributes.
|
6
|
+
|
7
|
+
The API documentation at "http://hudson.projecthydra.org/job/om/Documentation/OM.html":http://hudson.projecthydra.org/job/om/Documentation/OM.html provides additional, more targeted information. We will provide links to the API as appropriate.
|
8
|
+
|
9
|
+
h4. What you will learn from this document
|
10
|
+
|
11
|
+
# Install OM and run it in IRB
|
12
|
+
# Build an OM Terminology
|
13
|
+
# Use OM XML Document class
|
14
|
+
# Create XML from the OM XML Document
|
15
|
+
# Load existing XML into an OM XML Document
|
16
|
+
# Query OM XML Document to get term values
|
17
|
+
# Access the Terminology of an OM XML Document
|
18
|
+
# Retrieve XPath from the terminology
|
19
|
+
|
20
|
+
|
21
|
+
h2. Install OM
|
22
|
+
|
1
23
|
To get started, you will create a new folder, set up a Gemfile to install OM, and then run bundler.
|
2
24
|
|
3
25
|
<pre>
|
@@ -5,20 +27,24 @@ mkdir omtest
|
|
5
27
|
cd omtest
|
6
28
|
</pre>
|
7
29
|
|
8
|
-
Using whichever editor you prefer, create a file called Gemfile with the following contents
|
30
|
+
Using whichever editor you prefer, create a file (in omtest directory) called Gemfile with the following contents:
|
9
31
|
|
10
32
|
<pre>
|
11
33
|
source 'http://rubygems.org'
|
12
34
|
gem 'om'
|
13
35
|
</pre>
|
14
36
|
|
15
|
-
Now run bundler to install the gem:
|
37
|
+
Now run bundler to install the gem: (you will need the bundler Gem)
|
16
38
|
|
17
39
|
<pre>
|
18
40
|
bundle install
|
19
41
|
</pre>
|
20
42
|
|
21
|
-
You should now be set to use irb to run the following
|
43
|
+
You should now be set to use irb to run the following example.
|
44
|
+
|
45
|
+
h2. Build a simple OM terminology (in irb)
|
46
|
+
|
47
|
+
To experiment with abbreviated terminology examples, irb is your friend. If you are working on a persistent terminology and have to experiment to make sure you declare your terminology correctly, we recommend writing test code (e.g. with rspec). You can see examples of this "here":https://github.com/projecthydra/hydra-tutorial-application/blob/master/spec/models/journal_article_mods_datastream_spec.rb
|
22
48
|
|
23
49
|
<pre>
|
24
50
|
irb
|
@@ -28,7 +54,7 @@ require "om"
|
|
28
54
|
=> true
|
29
55
|
</pre>
|
30
56
|
|
31
|
-
|
57
|
+
Create a simple (simplish?) Terminology Builder ("OM::XML::Terminology::Builder":OM/XML/Terminology/Builder.html") based on a couple of elements from the MODS schema.
|
32
58
|
|
33
59
|
<pre>
|
34
60
|
terminology_builder = OM::XML::Terminology::Builder.new do |t|
|
@@ -54,28 +80,15 @@ terminology_builder = OM::XML::Terminology::Builder.new do |t|
|
|
54
80
|
end
|
55
81
|
</pre>
|
56
82
|
|
57
|
-
Now tell the Builder to build your Terminology
|
83
|
+
Now tell the Terminology Builder to build your Terminology ("OM::XML::Terminology":OM/XML/Terminology.html"):
|
58
84
|
|
59
85
|
<pre>terminology = terminology_builder.build</pre>
|
60
86
|
|
61
|
-
h2. Using a Terminology to generate XPath Queries based on Term Pointers ("OM::XML::TermXPathGenerator":OM/XML/TermXpathGenerator.html)
|
62
|
-
|
63
|
-
The Terminology handles generating xpath queries based on the structures you've defined. It will also run the queries for you, so in most cases you won't even have to look at the XPath. If you're ever curious what the xpath queries are, or if you want to use them in some other way, they are a few keystrokes away.
|
64
|
-
|
65
|
-
Here are the xpaths for :name and two variants of :name that were created using the :ref argument in the Terminology builder.
|
66
|
-
|
67
|
-
<pre>
|
68
|
-
terminology.xpath_for(:name)
|
69
|
-
=> "//oxns:name"
|
70
|
-
terminology.xpath_for(:person)
|
71
|
-
=> "//oxns:name[@type=\"personal\"]"
|
72
|
-
terminology.xpath_for(:organization)
|
73
|
-
=> "//oxns:name[@type=\"corporate\"]"
|
74
|
-
</pre>
|
75
|
-
|
76
87
|
h2. OM Documents
|
77
88
|
|
78
|
-
|
89
|
+
Generally you will use an "OM::XML::Document":OM/XML/Document.html to work with your xml. Here's how to define a Document class that uses the same Terminology as above.
|
90
|
+
|
91
|
+
In a separate window (so you can keep irb running), create the file my_mods_document.rb in the omtest directory, with this content:
|
79
92
|
|
80
93
|
<pre>
|
81
94
|
class MyModsDocument < ActiveFedora::NokogiriDatastream
|
@@ -101,7 +114,10 @@ In action, you will usually use "OM::XML::Document":OM/XML/Document.html to deal
|
|
101
114
|
}
|
102
115
|
end
|
103
116
|
|
117
|
+
# Generates an empty Mods Article (used when you call ModsArticle.new without passing in existing xml)
|
118
|
+
# (overrides default behavior of creating a plain xml document)
|
104
119
|
def self.xml_template
|
120
|
+
# use Nokogiri to build the XML
|
105
121
|
builder = Nokogiri::XML::Builder.new do |xml|
|
106
122
|
xml.mods(:version=>"3.3", "xmlns:xlink"=>"http://www.w3.org/1999/xlink",
|
107
123
|
"xmlns:xsi"=>"http://www.w3.org/2001/XMLSchema-instance",
|
@@ -122,15 +138,24 @@ In action, you will usually use "OM::XML::Document":OM/XML/Document.html to deal
|
|
122
138
|
}
|
123
139
|
}
|
124
140
|
end
|
141
|
+
# return a Nokogiri::XML::Document, not an OM::XML::Document
|
125
142
|
return builder.doc
|
126
143
|
end
|
127
144
|
|
128
145
|
end
|
129
146
|
</pre>
|
130
147
|
|
131
|
-
|
148
|
+
(Note that we are now also using the ActiveFedora gem.)
|
149
|
+
|
150
|
+
"OM::XML::Document":OM/XML/Document.html provides the set_terminology method to handle the details of creating a TerminologyBuilder and building the terminology for you. This allows you to focus on defining the structures of the Terminology itself.
|
132
151
|
|
133
|
-
h3. Creating XML Documents from Scratch
|
152
|
+
h3. Creating XML Documents from Scratch using OM
|
153
|
+
|
154
|
+
By default, new OM Document instances will create an empty xml document, but if you override self.xml_template to return a different object (e.g. "Nokogiri::XML::Document":http://nokogiri.rubyforge.org/nokogiri/Nokogiri/XML/Document.html), that will be created instead.
|
155
|
+
|
156
|
+
In the example above, we have overridden xml_template to build an empty, relatively simple MODS document as a "Nokogiri::XML::Document":http://nokogiri.rubyforge.org/nokogiri/Nokogiri/XML/Document.html. We use "Nokogiri::XML::Builder":http://nokogiri.rubyforge.org/nokogiri/Nokogiri/XML/Builder.html and call its .doc method at the end of xml_template in order to return the "Nokogiri::XML::Document":http://nokogiri.rubyforge.org/nokogiri/Nokogiri/XML/Document.html object. Instead of using "Nokogiri::XML::Builder":http://nokogiri.rubyforge.org/nokogiri/Nokogiri/XML/Builder.html, you could put your template into an actual xml file and have xml_template use "Nokogiri::XML::Document.parse":http://nokogiri.rubyforge.org/nokogiri/Nokogiri/XML/Document.html#M000225 to load it. That's up to you. Create the documents however you want, just return a "Nokogiri::XML::Document":http://nokogiri.rubyforge.org/nokogiri/Nokogiri/XML/Document.html.
|
157
|
+
|
158
|
+
to use "Nokogiri::XML::Builder":http://nokogiri.rubyforge.org/nokogiri/Nokogiri/XML/Builder.html
|
134
159
|
|
135
160
|
<pre>
|
136
161
|
require "my_mods_document"
|
@@ -139,44 +164,27 @@ newdoc.to_xml
|
|
139
164
|
=> NoMethodError: undefined method `to_xml' for nil:NilClass
|
140
165
|
</pre>
|
141
166
|
|
142
|
-
By default, new OM Document instances will create an empty xml document. However, if you set self.xml_template to return a different "Nokogiri::XML::Document":http://nokogiri.rubyforge.org/nokogiri/Nokogiri/XML/Document.html, that will be used instead.
|
143
|
-
|
144
|
-
In the example above, we have overridden xml_template to use "Nokogiri::XML::Builder":http://nokogiri.rubyforge.org/nokogiri/Nokogiri/XML/Builder.html to build an empty, relatively simple MODS document. Note that at the end of the definition for xml_template, we call .doc on that XML Builder to return the "Nokogiri::XML::Document":http://nokogiri.rubyforge.org/nokogiri/Nokogiri/XML/Document.html object. This is important because you need xml_template to return a "Nokogiri::XML::Document":http://nokogiri.rubyforge.org/nokogiri/Nokogiri/XML/Document.html. Instead of using "Nokogiri::XML::Builder":http://nokogiri.rubyforge.org/nokogiri/Nokogiri/XML/Builder.html, you could put your template into an actual xml file and have xml_template use "Nokogiri::XML::Document.parse":http://nokogiri.rubyforge.org/nokogiri/Nokogiri/XML/Document.html#M000225 to load it. That's up to you. Create the documents however you want, just return a "Nokogiri::XML::Document":http://nokogiri.rubyforge.org/nokogiri/Nokogiri/XML/Document.html.
|
145
167
|
|
168
|
+
h3. Loading an existing XML document with OM
|
146
169
|
|
147
|
-
|
170
|
+
To load existing XML into your OM Document, use "#from_xml":OM/XML/Container/ClassMethods.html#from_xml-instance_method"
|
148
171
|
|
149
|
-
|
150
|
-
|
151
|
-
Download "hydrangea_article1.xml":https://github.com/mediashelf/om/blob/master/spec/fixtures/mods_articles/hydrangea_article1.xml into your working directory, then run this:
|
172
|
+
For an example, download "hydrangea_article1.xml":https://github.com/mediashelf/om/blob/master/spec/fixtures/mods_articles/hydrangea_article1.xml into your working directory (omtest), then run this in irb:
|
152
173
|
|
153
174
|
<pre>
|
154
175
|
sample_xml = File.new("hydrangea_article1.xml")
|
155
176
|
doc = MyModsDocument.from_xml(sample_xml)
|
156
177
|
</pre>
|
157
178
|
|
158
|
-
|
179
|
+
Take a look at the document object's xml that you've just populated. We will use this document for the next few examples.
|
159
180
|
|
160
181
|
<pre>doc.to_xml</pre>
|
161
182
|
|
162
|
-
h3. Directly accessing the "Nokogiri::XML::Document":http://nokogiri.rubyforge.org/nokogiri/Nokogiri/XML/Document.html and the "OM::XML::Terminology":https://github.com/mediashelf/om/blob/master/lib/om/xml/terminology.rb
|
163
|
-
|
164
|
-
"OM::XML::Document":https://github.com/mediashelf/om/blob/master/lib/om/xml/document.rb is implemented as a container for a "Nokogiri::XML::Document":http://nokogiri.rubyforge.org/nokogiri/Nokogiri/XML/Document.html. It uses the associated Terminology to provide a bunch of convenience methods that wrap calls to Nokogiri. If you ever need to operate directly on the Nokogiri Document, simply call ng_xml and do what you need to do. OM will not get in your way.
|
165
|
-
|
166
|
-
<pre>ng_document = doc.ng_xml</pre>
|
167
|
-
|
168
|
-
If you need to look at the Terminology associated with your Document, call "#terminology":OM/XML/Document/ClassMethods.html#terminology-instance_method on the _class_.
|
169
|
-
|
170
|
-
<pre>
|
171
|
-
MyModsDocument.terminology
|
172
|
-
doc.class.terminology
|
173
|
-
</pre>
|
174
|
-
|
175
183
|
h3. Querying OM Documents
|
176
184
|
|
177
|
-
Using the Terminology associated with your Document, you can query the xml for
|
185
|
+
Using the Terminology associated with your Document, you can query the xml for nodes _or_ node values without ever writing a line of XPath.
|
178
186
|
|
179
|
-
You can use OM::XML::Document.find_by_terms to retrieve xml
|
187
|
+
You can use OM::XML::Document.find_by_terms to retrieve xml _nodes_ from the datastream. It returns Nokogiri::XML::Node objects:
|
180
188
|
|
181
189
|
<pre>
|
182
190
|
doc.find_by_terms(:person)
|
@@ -184,14 +192,16 @@ doc.find_by_terms(:person).length
|
|
184
192
|
doc.find_by_terms(:person).each {|n| puts n.to_xml}
|
185
193
|
</pre>
|
186
194
|
|
187
|
-
|
195
|
+
You might prefer to use nodes as a way of getting multiple values pertaining to a node, rather than doing more expensive lookups for each desired value.
|
196
|
+
|
197
|
+
If you want to get directly to the _values_ within those nodes, use OM::XML::Document.term_values:
|
188
198
|
|
189
199
|
<pre>
|
190
200
|
doc.term_values(:person, :given_name)
|
191
201
|
doc.term_values(:person, :family_name)
|
192
202
|
</pre>
|
193
203
|
|
194
|
-
If the xpath points to XML nodes that contain other nodes, the response to term_values will contain Nokogiri::XML::Node objects instead of text values
|
204
|
+
If the xpath points to XML nodes that contain other nodes, the response to term_values will contain Nokogiri::XML::Node objects instead of text values:
|
195
205
|
|
196
206
|
<pre>
|
197
207
|
doc.term_values(:name)
|
@@ -205,12 +215,40 @@ For more examples of Updating OM Documents, see "Updating Documents":https://git
|
|
205
215
|
|
206
216
|
h3. Validating Documents
|
207
217
|
|
208
|
-
If you have
|
218
|
+
If you have an XML schema defined in your Terminology's root Term, you can validate any xml document by calling ".validate" on any instance of your Document classes.
|
209
219
|
|
210
220
|
<pre>doc.validate</pre>
|
211
221
|
|
212
|
-
__*Note:* this method requires an internet connection, as it will download the schema from the URL you have specified in the Terminology's root term.__
|
222
|
+
__*Note:* this method requires an internet connection, as it will download the XML schema from the URL you have specified in the Terminology's root term.__
|
223
|
+
|
224
|
+
h3. Directly accessing the "Nokogiri::XML::Document":http://nokogiri.rubyforge.org/nokogiri/Nokogiri/XML/Document.html and the "OM::XML::Terminology":https://github.com/mediashelf/om/blob/master/lib/om/xml/terminology.rb
|
225
|
+
|
226
|
+
"OM::XML::Document":https://github.com/mediashelf/om/blob/master/lib/om/xml/document.rb is implemented as a container for a "Nokogiri::XML::Document":http://nokogiri.rubyforge.org/nokogiri/Nokogiri/XML/Document.html. It uses the associated OM Terminology to provide a bunch of convenience methods that wrap calls to Nokogiri. If you ever need to operate directly on the Nokogiri Document, simply call ng_xml and do what you need to do. OM will not get in your way.
|
227
|
+
|
228
|
+
<pre>ng_document = doc.ng_xml</pre>
|
229
|
+
|
230
|
+
If you need to look at the Terminology associated with your Document, call "#terminology":OM/XML/Document/ClassMethods.html#terminology-instance_method on the Document's _class_.
|
231
|
+
|
232
|
+
<pre>
|
233
|
+
MyModsDocument.terminology
|
234
|
+
doc.class.terminology
|
235
|
+
</pre>
|
236
|
+
|
237
|
+
h2. Using a Terminology to generate XPath Queries based on Term Pointers
|
238
|
+
|
239
|
+
Because the Terminology is essentially a mapping from XPath queries to ruby object attributes, in most cases you won't need to know the actual XPath queries. Nevertheless, when you <i>do</i> want to know the Xpath (e.g. for ensuring your terminology is correct) for a term, the Terminology can generate xpath queries based on the structures you've defined ("OM::XML::TermXPathGenerator":OM/XML/TermXpathGenerator.html).
|
240
|
+
|
241
|
+
Here are the xpaths for :name and two variants of :name that were created using the :ref argument in the Terminology Builder:
|
242
|
+
|
243
|
+
<pre>
|
244
|
+
terminology.xpath_for(:name)
|
245
|
+
=> "//oxns:name"
|
246
|
+
terminology.xpath_for(:person)
|
247
|
+
=> "//oxns:name[@type=\"personal\"]"
|
248
|
+
terminology.xpath_for(:organization)
|
249
|
+
=> "//oxns:name[@type=\"corporate\"]"
|
250
|
+
</pre>
|
213
251
|
|
214
|
-
|
252
|
+
h2. Solrizing Documents
|
215
253
|
|
216
254
|
The solrizer gem provides support for indexing XML documents into Solr based on OM Terminologies. That process is documented in the "solrizer documentation":http://hudson.projecthydra.org/job/solrizer/Documentation/file.SOLRIZING_OM_DOCUMENTS.html
|
data/QUERYING_DOCUMENTS.textile
CHANGED
@@ -1,6 +1,6 @@
|
|
1
1
|
h2. Querying OM Documents
|
2
2
|
|
3
|
-
This document
|
3
|
+
This document will help you understand how to access the information associated with an "OM::XML::Document":OM/XML/Document.html object. We will explain some of the methods provided by the "OM::XML::Document":OM/XML/Document.html module and its related modules "OM::XML::TermXPathGenerator":OM/XML/TermXPathGenerator.html & "OM::XML::TermValueOperators":OM/XML/TermValueOperators.html
|
4
4
|
|
5
5
|
_Note: In your code, don't worry about including OM::XML::TermXPathGenerator and OM::XML::TermValueOperators into your classes. OM::XML::Document handles that for you._
|
6
6
|
|
@@ -8,7 +8,7 @@ h3. Load the Sample XML and Sample Terminology
|
|
8
8
|
|
9
9
|
These examples use the Document class defined in "OM::Samples::ModsArticle":https://github.com/mediashelf/om/blob/master/lib/om/samples/mods_article.rb
|
10
10
|
|
11
|
-
Download "hydrangea_article1.xml":https://github.com/mediashelf/om/blob/master/spec/fixtures/mods_articles/hydrangea_article1.xml into your working directory, then run this:
|
11
|
+
Download "hydrangea_article1.xml":https://github.com/mediashelf/om/blob/master/spec/fixtures/mods_articles/hydrangea_article1.xml (sample xml) into your working directory, then run this in irb:
|
12
12
|
|
13
13
|
<pre>
|
14
14
|
require "om/samples"
|
@@ -16,11 +16,15 @@ Download "hydrangea_article1.xml":https://github.com/mediashelf/om/blob/master/s
|
|
16
16
|
doc = OM::Samples::ModsArticle.from_xml(sample_xml)
|
17
17
|
</pre>
|
18
18
|
|
19
|
-
|
19
|
+
h2. Querying the "OM::XML::Document":OM/XML/Document.html
|
20
20
|
|
21
|
-
The OM
|
21
|
+
The "OM::XML::Terminology":OM/XML/Terminology.html" declared by "OM::Samples::ModsArticle":https://github.com/mediashelf/om/blob/master/lib/om/samples/mods_article.rb maps the defined Terminology structure to xpath queries. It will also run the queries for you in most cases.
|
22
22
|
|
23
|
-
|
23
|
+
h4. xpath_for method of "OM::XML::Terminology":OM/XML/Terminology.html" retrieves xpath expressions for OM terms
|
24
|
+
|
25
|
+
The xpath_for method retrieves the xpath used by the "OM::XML::Terminology":OM/XML/Terminology.html"
|
26
|
+
|
27
|
+
Examples of xpaths for :name and two variants of :name that were created using the :ref argument in the Terminology builder:
|
24
28
|
|
25
29
|
<pre>
|
26
30
|
OM::Samples::ModsArticle.terminology.xpath_for(:name)
|
@@ -31,20 +35,24 @@ OM::Samples::ModsArticle.terminology.xpath_for(:organization)
|
|
31
35
|
=> "//oxns:name[@type=\"corporate\"]"
|
32
36
|
</pre>
|
33
37
|
|
34
|
-
|
38
|
+
h4. Working with Terms
|
39
|
+
|
40
|
+
To retrieve the values of xml nodes, use the term_values method:
|
35
41
|
|
36
42
|
<pre>
|
37
43
|
doc.term_values(:person, :first_name)
|
38
44
|
doc.term_values(:person, :last_name)
|
39
45
|
</pre>
|
40
46
|
|
41
|
-
|
47
|
+
The term_values method is defined in the "OM::XML::TermValueOperators":OM/XML/TermValueOperators.html module, which is included in "OM::XML::Document":OM/XML/Document.html
|
48
|
+
|
49
|
+
Not that if a term's xpath mapping points to XML nodes that contain other nodes, the response to term_values will be Nokogiri::XML::Node objects instead of text values:
|
42
50
|
|
43
51
|
<pre>
|
44
52
|
doc.term_values(:name)
|
45
53
|
</pre>
|
46
54
|
|
47
|
-
More examples of using term_values and find_by_terms:
|
55
|
+
More examples of using term_values and find_by_terms (defined in "OM::XML::Document":OM/XML/Document.html):
|
48
56
|
|
49
57
|
<pre>
|
50
58
|
doc.find_by_terms(:organization).to_xml
|
@@ -54,7 +62,7 @@ doc.term_values(:organization, :namePart)
|
|
54
62
|
=> ["NSF"]
|
55
63
|
</pre>
|
56
64
|
|
57
|
-
|
65
|
+
To retrieve the values of nested terms, create a sequence of terms, from outermost to innermost:
|
58
66
|
|
59
67
|
<pre>
|
60
68
|
OM::Samples::ModsArticle.terminology.xpath_for(:journal, :issue, :pages, :start)
|
@@ -63,30 +71,23 @@ doc.term_values(:journal, :issue, :pages, :start)
|
|
63
71
|
=> ["195"]
|
64
72
|
</pre>
|
65
73
|
|
66
|
-
If you get one of the names wrong in the
|
74
|
+
If you get one of the term names wrong in the sequence, OM will tell you which one is causing problems. See what happens when you put :page instead of :pages in your argument to term_values.
|
67
75
|
|
68
76
|
<pre>
|
69
77
|
doc.term_values(:journal, :issue, :page, :start)
|
70
78
|
OM::XML::Terminology::BadPointerError: You attempted to retrieve a Term using this pointer: [:journal, :issue, :page] but no Term exists at that location. Everything is fine until ":page", which doesn't exist.
|
71
79
|
</pre>
|
72
80
|
|
73
|
-
If you use a term often and you're sick of typing all of those term names, you can define a proxy term. Here we have a proxy term called :start_page that saves you from having to remember the details of how MODS is structured.
|
74
81
|
|
75
|
-
|
76
|
-
OM::Samples::ModsArticle.terminology.xpath_for(:journal, :issue, :start_page)
|
77
|
-
=> "//oxns:relatedItem[@type=\"host\"]/oxns:part/oxns:extent[@unit=\"pages\"]/oxns:start"
|
78
|
-
</pre>
|
79
|
-
|
80
|
-
Here is another proxy term. It proxies to [:journal,:origin_info,:issuance]
|
82
|
+
h2. When XML Elements are Reused in a Document
|
81
83
|
|
82
|
-
|
83
|
-
OM::Samples::ModsArticle.terminology.xpath_for(:peer_reviewed)
|
84
|
-
=> "//oxns:relatedItem[@type=\"host\"]/oxns:originInfo/oxns:issuance"
|
85
|
-
</pre>
|
84
|
+
(Another way to put this: the xpath statement for a term can be ambiguous.)
|
86
85
|
|
87
|
-
|
86
|
+
In our MODS document, we have two distinct uses of the title XML element:
|
87
|
+
# the title of the published article
|
88
|
+
# the title of the journal it was published in.
|
88
89
|
|
89
|
-
|
90
|
+
How can we distinguish between these two uses?
|
90
91
|
|
91
92
|
<pre>
|
92
93
|
doc.term_values(:title_info, :main_title)
|
@@ -115,7 +116,9 @@ doc.term_values(:journal, :title_info, :main_title)
|
|
115
116
|
|
116
117
|
h2. Making life easier with Proxy Terms
|
117
118
|
|
118
|
-
|
119
|
+
If you use a nested term often, you may want to avoid typing the whole sequence of term names by defining a _proxy_ term.
|
120
|
+
|
121
|
+
As you can see in "OM::Samples::ModsArticle":https://github.com/mediashelf/om/blob/master/lib/om/samples/mods_article.rb, we have defined a few proxy terms for convenience.
|
119
122
|
|
120
123
|
<pre>
|
121
124
|
t.publication_url(:proxy=>[:location,:url])
|
@@ -124,9 +127,11 @@ t.title(:proxy=>[:mods,:title_info, :main_title])
|
|
124
127
|
t.journal_title(:proxy=>[:journal, :title_info, :main_title])
|
125
128
|
</pre>
|
126
129
|
|
127
|
-
You can use
|
130
|
+
You can use proxy terms just like any other term when querying the document.
|
128
131
|
|
129
132
|
<pre>
|
133
|
+
OM::Samples::ModsArticle.terminology.xpath_for(:peer_reviewed)
|
134
|
+
=> "//oxns:relatedItem[@type=\"host\"]/oxns:originInfo/oxns:issuance"
|
130
135
|
OM::Samples::ModsArticle.terminology.xpath_for(:title)
|
131
136
|
=> "//oxns:mods/oxns:titleInfo/oxns:title"
|
132
137
|
OM::Samples::ModsArticle.terminology.xpath_for(:journal_title)
|
data/README.rdoc
CHANGED
@@ -1,6 +1,10 @@
|
|
1
1
|
= opinionated-xml
|
2
2
|
|
3
|
-
A library to help you tame sprawling XML schemas like MODS.
|
3
|
+
A library to help you tame sprawling XML schemas like MODS.
|
4
|
+
|
5
|
+
OM allows you to define a “terminology” to ease translation between XML and ruby objects – you can query the xml for Nodes or node values without ever writing a line of XPath.
|
6
|
+
|
7
|
+
OM “terms” are ruby symbols you define (in the terminology) that map specific XML content into ruby object attributes.
|
4
8
|
|
5
9
|
== Note on Patches/Pull Requests
|
6
10
|
|
data/README.textile
CHANGED
@@ -2,6 +2,10 @@ h1. om (Optinionated Metadata)
|
|
2
2
|
|
3
3
|
A library to help you tame sprawling XML schemas like MODS.
|
4
4
|
|
5
|
+
OM allows you to define a “terminology” to ease translation between XML and ruby objects – you can query the xml for Nodes or node values without ever writing a line of XPath.
|
6
|
+
|
7
|
+
OM “terms” are ruby symbols you define (in the terminology) that map specific XML content into ruby object attributes.
|
8
|
+
|
5
9
|
h2. Tutorials
|
6
10
|
|
7
11
|
* "Getting Started":http://hudson.projecthydra.org/job/om/Documentation/file.GETTING_STARTED.html
|