om 1.2.4 → 1.2.5
Sign up to get free protection for your applications and to get access to all the features.
- data/.gitignore +25 -0
- data/.rvmrc +33 -0
- data/COMMON_OM_PATTERNS.textile +14 -0
- data/GETTING_FANCY.textile +143 -0
- data/GETTING_STARTED.textile +216 -0
- data/Gemfile +7 -0
- data/Gemfile.lock +7 -1
- data/History.textile +4 -0
- data/QUERYING_DOCUMENTS.textile +134 -0
- data/README.textile +19 -10
- data/Rakefile +7 -28
- data/UPDATING_DOCUMENTS.textile +3 -0
- data/lib/om/samples/mods_article.rb +30 -15
- data/lib/om/version.rb +3 -0
- data/lib/om/xml/document.rb +1 -0
- data/lib/om/xml/term.rb +8 -0
- data/lib/om/xml/terminology.rb +10 -0
- data/lib/om.rb +5 -1
- data/lib/tasks/om.rake +56 -0
- data/om.gemspec +19 -207
- data/spec/unit/document_spec.rb +7 -1
- data/spec/unit/nokogiri_sanity_spec.rb +75 -0
- metadata +41 -523
data/.gitignore
ADDED
data/.rvmrc
ADDED
@@ -0,0 +1,33 @@
|
|
1
|
+
#!/usr/bin/env bash
|
2
|
+
|
3
|
+
# This is an RVM Project .rvmrc file, used to automatically load the ruby
|
4
|
+
# development environment upon cd'ing into the directory
|
5
|
+
|
6
|
+
ruby_string="ree-1.8.7"
|
7
|
+
gemset_name="om"
|
8
|
+
|
9
|
+
# Install rubies when used instead of only displaying a warning and exiting
|
10
|
+
rvm_install_on_use_flag=1
|
11
|
+
|
12
|
+
# Specify our desired <ruby>[@<gemset>], the @gemset name is optional.
|
13
|
+
environment_id="${ruby_string}@${gemset_name}"
|
14
|
+
|
15
|
+
# First, attempt to load the desired environment directly from the environment
|
16
|
+
# file. This is very fast and efficient compared to running through the entire
|
17
|
+
# CLI and selector. If you want feedback on which environment was used then
|
18
|
+
# insert the word 'use' after --create as this triggers verbose mode.
|
19
|
+
#
|
20
|
+
if [[ -d "${rvm_path:-$HOME/.rvm}/environments" \
|
21
|
+
&& -s "${rvm_path:-$HOME/.rvm}/environments/$environment_id" ]] ; then
|
22
|
+
\. "${rvm_path:-$HOME/.rvm}/environments/$environment_id"
|
23
|
+
else
|
24
|
+
# If the environment file has not yet been created, use the RVM CLI to select.
|
25
|
+
rvm --create "$environment_id"
|
26
|
+
fi
|
27
|
+
|
28
|
+
# Ensure that Bundler is installed, install it if it is not.
|
29
|
+
if ! (command -v bundle > /dev/null) ; then
|
30
|
+
printf "The rubygem 'bundler' is not installed, installing it now.\n"
|
31
|
+
gem install bundler
|
32
|
+
fi
|
33
|
+
|
@@ -0,0 +1,14 @@
|
|
1
|
+
h1. Common Patterns You'll Use with OM
|
2
|
+
|
3
|
+
h2. Common Terminology Patterns
|
4
|
+
|
5
|
+
h3. Reserved method names (ie. id_, root_)
|
6
|
+
|
7
|
+
Like Nokogiri ...
|
8
|
+
|
9
|
+
h3. Namespaces
|
10
|
+
oxns
|
11
|
+
document namespaces & node namespaces
|
12
|
+
_no namespace_ (suppressing oxns in xpath queries)
|
13
|
+
|
14
|
+
h3. :ref and :proxy Terms
|
@@ -0,0 +1,143 @@
|
|
1
|
+
h1. Getting Fancy
|
2
|
+
|
3
|
+
|
4
|
+
|
5
|
+
h2. Alternative ways to Manipulate Terms, Terminologies and their Builders
|
6
|
+
|
7
|
+
h3. The Example
|
8
|
+
|
9
|
+
There are a few ways to build a terminology. The simplest way is to use the Terminology Builder block syntax. This is what most of the tutorials use.
|
10
|
+
|
11
|
+
In the following examples, we will show different ways of building this Terminology:
|
12
|
+
|
13
|
+
<pre>
|
14
|
+
builder = OM::XML::Terminology::Builder.new do |t|
|
15
|
+
t.root(:path=>"grants", :xmlns=>"http://yourmediashelf.com/schemas/hydra-dataset/v0", :schema=>"http://example.org/schemas/grants-v1.xsd")
|
16
|
+
t.grant {
|
17
|
+
t.org(:path=>"organization", :attributes=>{:type=>"funder"}) {
|
18
|
+
t.name(:index_as=>:searchable)
|
19
|
+
}
|
20
|
+
t.number
|
21
|
+
}
|
22
|
+
end
|
23
|
+
another_terminology = builder.build
|
24
|
+
</pre>
|
25
|
+
|
26
|
+
|
27
|
+
h3. Using Term::Builders
|
28
|
+
|
29
|
+
First, create the Terminology Builder object.
|
30
|
+
|
31
|
+
<pre>
|
32
|
+
terminology_builder = OM::XML::Terminology::Builder.new
|
33
|
+
</pre>
|
34
|
+
|
35
|
+
The .root method handles creating the root term and setting namespaces, schema, etc on the Terminology
|
36
|
+
|
37
|
+
<pre>
|
38
|
+
terminology_builder.root(:path=>"grants", :xmlns=>"http://yourmediashelf.com/schemas/hydra-dataset/v0", :schema=>"http://example.org/schemas/grants-v1.xsd")
|
39
|
+
</pre>
|
40
|
+
|
41
|
+
As you can see, this sets the namespaces for you and created the "grants" root term.
|
42
|
+
|
43
|
+
<pre>
|
44
|
+
terminology_builder.namespaces
|
45
|
+
=> {"oxns"=>"http://yourmediashelf.com/schemas/hydra-dataset/v0", "xmlns"=>"http://yourmediashelf.com/schemas/hydra-dataset/v0"}
|
46
|
+
terminology_builder.term_builders
|
47
|
+
</pre>
|
48
|
+
|
49
|
+
Create Term Builders for each of the Terms
|
50
|
+
|
51
|
+
<pre>
|
52
|
+
term1_builder = OM::XML::Term::Builder.new("grant", terminology_builder).path("grant")
|
53
|
+
subterm1_builder = OM::XML::Term::Builder.new("org", terminology_builder).attributes(:type=>"funder")
|
54
|
+
subsubterm_builder = OM::XML::Term::Builder.new("name", terminology_builder).index_as(:searchable)
|
55
|
+
subterm2_builder = OM::XML::Term::Builder.new("number", terminology_builder)
|
56
|
+
</pre>
|
57
|
+
|
58
|
+
Assemble the tree of Term builders by adding child builders to their parents, then add those to the Terminology builder.
|
59
|
+
|
60
|
+
<pre>
|
61
|
+
subterm1_builder.add_child(subsubterm_builder)
|
62
|
+
term1_builder.add_child(subterm1_builder)
|
63
|
+
term1_builder.add_child(subterm2_builder)
|
64
|
+
terminology_builder.term_builders["grant"] = term1_builder
|
65
|
+
</pre>
|
66
|
+
|
67
|
+
Now build the Terminology, which will also call .build on each of the Term Builders
|
68
|
+
|
69
|
+
<pre>
|
70
|
+
built_terminology = terminology_builder.build
|
71
|
+
</pre>
|
72
|
+
|
73
|
+
Test it out:
|
74
|
+
|
75
|
+
<pre>
|
76
|
+
built_terminology.retrieve_term(:grant, :org, :name)
|
77
|
+
built_terminology.xpath_for(:grant, :org, :name)
|
78
|
+
built_terminology.root_terms
|
79
|
+
built_terminology.terms.keys # This will only return the Terms at the root of the terminology hierarchy
|
80
|
+
built_terminology.retrieve_term(:grant).children.keys
|
81
|
+
</pre>
|
82
|
+
|
83
|
+
h3. Creating Terms & Terminologies without any Builders
|
84
|
+
|
85
|
+
If you want to manipulate Terms and Terminologies directly rather than using the Builder classes, you can consume their APIs at any time.
|
86
|
+
|
87
|
+
People don't often do this, but the option is there if you need it.
|
88
|
+
|
89
|
+
Create the Terminology, set its namespaces & (optional) schema
|
90
|
+
Note that you have to set the :oxns namespaces to match :xmlns. This is usually done for you by the Terminology::Builder.root method.
|
91
|
+
|
92
|
+
<pre>
|
93
|
+
handcrafted_terminology = OM::XML::Terminology.new
|
94
|
+
handcrafted_terminology.namespaces[:xmlns] = "http://yourmediashelf.com/schemas/hydra-dataset/v0"
|
95
|
+
handcrafted_terminology.namespaces[:oxns] = "http://yourmediashelf.com/schemas/hydra-dataset/v0"
|
96
|
+
handcrafted_terminology.schema = "http://example.org/schemas/grants-v1.xsd"
|
97
|
+
</pre>
|
98
|
+
|
99
|
+
Create the Terms
|
100
|
+
|
101
|
+
<pre>
|
102
|
+
# Create term1 (the root) and set it as the root term
|
103
|
+
root_term = OM::XML::Term.new("grants")
|
104
|
+
root_term.is_root_term = true
|
105
|
+
|
106
|
+
# Create term1 (grant) and its subterms
|
107
|
+
term1 = OM::XML::Term.new("grant")
|
108
|
+
|
109
|
+
subterm1 = OM::XML::Term.new("org")
|
110
|
+
subterm1.path = "organization"
|
111
|
+
subterm1.attributes = {:type=>"funder"}
|
112
|
+
|
113
|
+
subsubterm = OM::XML::Term.new("name")
|
114
|
+
subsubterm.index_as = :searchable
|
115
|
+
|
116
|
+
subterm2 = OM::XML::Term.new("number")
|
117
|
+
</pre>
|
118
|
+
|
119
|
+
Assemble the tree of terms by adding child terms to their parents, then add those to the Terminology.
|
120
|
+
|
121
|
+
<pre>
|
122
|
+
subterm1.add_child(subsubterm)
|
123
|
+
term1.add_child(subterm1)
|
124
|
+
term1.add_child(subterm2)
|
125
|
+
handcrafted_terminology.add_term(root_term)
|
126
|
+
handcrafted_terminology.add_term(term1)
|
127
|
+
</pre>
|
128
|
+
|
129
|
+
Generate the xpath queries for each term. This is usually done for you by the Term Builder.build method
|
130
|
+
|
131
|
+
<pre>
|
132
|
+
[root_term, term1, subterm1, subsubterm, subterm2].each {|t| t.generate_xpath_queries!}
|
133
|
+
</pre>
|
134
|
+
|
135
|
+
Test it out:
|
136
|
+
|
137
|
+
<pre>
|
138
|
+
handcrafted_terminology.retrieve_term(:grant, :org, :name)
|
139
|
+
handcrafted_terminology.xpath_for(:grant, :org, :name)
|
140
|
+
handcrafted_terminology.root_terms
|
141
|
+
handcrafted_terminology.terms.keys # This will only return the Terms at the root of the terminology hierarchy
|
142
|
+
handcrafted_terminology.retrieve_term(:grant).children.keys
|
143
|
+
</pre>
|
@@ -0,0 +1,216 @@
|
|
1
|
+
To get started, you will create a new folder, set up a Gemfile to install OM, and then run bundler.
|
2
|
+
|
3
|
+
<pre>
|
4
|
+
mkdir omtest
|
5
|
+
cd omtest
|
6
|
+
</pre>
|
7
|
+
|
8
|
+
Using whichever editor you prefer, create a file called Gemfile with the following contents
|
9
|
+
|
10
|
+
<pre>
|
11
|
+
source 'http://rubygems.org'
|
12
|
+
gem 'om'
|
13
|
+
</pre>
|
14
|
+
|
15
|
+
Now run bundler to install the gem:
|
16
|
+
|
17
|
+
<pre>
|
18
|
+
bundle install
|
19
|
+
</pre>
|
20
|
+
|
21
|
+
You should now be set to use irb to run the following examples.
|
22
|
+
|
23
|
+
<pre>
|
24
|
+
irb
|
25
|
+
require "rubygems"
|
26
|
+
=> true
|
27
|
+
require "om"
|
28
|
+
=> true
|
29
|
+
</pre>
|
30
|
+
|
31
|
+
Builder for a simple Terminology based on a couple of elements from the MODS schema.
|
32
|
+
|
33
|
+
<pre>
|
34
|
+
terminology_builder = OM::XML::Terminology::Builder.new do |t|
|
35
|
+
t.root(:path=>"mods", :xmlns=>"http://www.loc.gov/mods/v3", :schema=>"http://www.loc.gov/standards/mods/v3/mods-3-2.xsd")
|
36
|
+
# This is a mods:name. The underscore is purely to avoid namespace conflicts.
|
37
|
+
t.name_ {
|
38
|
+
t.namePart
|
39
|
+
t.role(:ref=>[:role])
|
40
|
+
t.family_name(:path=>"namePart", :attributes=>{:type=>"family"})
|
41
|
+
t.given_name(:path=>"namePart", :attributes=>{:type=>"given"}, :label=>"first name")
|
42
|
+
t.terms_of_address(:path=>"namePart", :attributes=>{:type=>"termsOfAddress"})
|
43
|
+
}
|
44
|
+
|
45
|
+
# Re-use the structure of a :name Term with a different @type attribute
|
46
|
+
t.person(:ref=>:name, :attributes=>{:type=>"personal"})
|
47
|
+
t.organization(:ref=>:name, :attributes=>{:type=>"corporate"})
|
48
|
+
|
49
|
+
# This is a mods:role, which is used within mods:namePart elements
|
50
|
+
t.role {
|
51
|
+
t.text(:path=>"roleTerm",:attributes=>{:type=>"text"})
|
52
|
+
t.code(:path=>"roleTerm",:attributes=>{:type=>"code"})
|
53
|
+
}
|
54
|
+
end
|
55
|
+
</pre>
|
56
|
+
|
57
|
+
Now tell the Builder to build your Terminology for you.
|
58
|
+
|
59
|
+
<pre>terminology = terminology_builder.build</pre>
|
60
|
+
|
61
|
+
h2. Using a Terminology to generate XPath Queries based on Term Pointers ("OM::XML::TermXPathGenerator":OM/XML/TermXpathGenerator.html)
|
62
|
+
|
63
|
+
The Terminology handles generating xpath queries based on the structures you've defined. It will also run the queries for you, so in most cases you won't even have to look at the XPath. If you're ever curious what the xpath queries are, or if you want to use them in some other way, they are a few keystrokes away.
|
64
|
+
|
65
|
+
Here are the xpaths for :name and two variants of :name that were created using the :ref argument in the Terminology builder.
|
66
|
+
|
67
|
+
<pre>
|
68
|
+
terminology.xpath_for(:name)
|
69
|
+
=> "//oxns:name"
|
70
|
+
terminology.xpath_for(:person)
|
71
|
+
=> "//oxns:name[@type=\"personal\"]"
|
72
|
+
terminology.xpath_for(:organization)
|
73
|
+
=> "//oxns:name[@type=\"corporate\"]"
|
74
|
+
</pre>
|
75
|
+
|
76
|
+
h2. OM Documents
|
77
|
+
|
78
|
+
In action, you will usually use "OM::XML::Document":OM/XML/Document.html to deal with your xml. Here's how to define a Document class that uses the same Terminology as above. In a separate window, create the file my_mods_document.rb in the directory you created at the beginning of this document.
|
79
|
+
|
80
|
+
<pre>
|
81
|
+
class MyModsDocument < ActiveFedora::NokogiriDatastream
|
82
|
+
include OM::XML::Document
|
83
|
+
|
84
|
+
set_terminology do |t|
|
85
|
+
t.root(:path=>"mods", :xmlns=>"http://www.loc.gov/mods/v3", :schema=>"http://www.loc.gov/standards/mods/v3/mods-3-2.xsd")
|
86
|
+
# This is a mods:name. The underscore is purely to avoid namespace conflicts.
|
87
|
+
t.name_ {
|
88
|
+
t.namePart
|
89
|
+
t.role(:ref=>[:role])
|
90
|
+
t.family_name(:path=>"namePart", :attributes=>{:type=>"family"})
|
91
|
+
t.given_name(:path=>"namePart", :attributes=>{:type=>"given"}, :label=>"first name")
|
92
|
+
t.terms_of_address(:path=>"namePart", :attributes=>{:type=>"termsOfAddress"})
|
93
|
+
}
|
94
|
+
t.person(:ref=>:name, :attributes=>{:type=>"personal"})
|
95
|
+
t.organization(:ref=>:name, :attributes=>{:type=>"corporate"})
|
96
|
+
|
97
|
+
# This is a mods:role, which is used within mods:namePart elements
|
98
|
+
t.role {
|
99
|
+
t.text(:path=>"roleTerm",:attributes=>{:type=>"text"})
|
100
|
+
t.code(:path=>"roleTerm",:attributes=>{:type=>"code"})
|
101
|
+
}
|
102
|
+
end
|
103
|
+
|
104
|
+
def self.xml_template
|
105
|
+
builder = Nokogiri::XML::Builder.new do |xml|
|
106
|
+
xml.mods(:version=>"3.3", "xmlns:xlink"=>"http://www.w3.org/1999/xlink",
|
107
|
+
"xmlns:xsi"=>"http://www.w3.org/2001/XMLSchema-instance",
|
108
|
+
"xmlns"=>"http://www.loc.gov/mods/v3",
|
109
|
+
"xsi:schemaLocation"=>"http://www.loc.gov/mods/v3 http://www.loc.gov/standards/mods/v3/mods-3-3.xsd") {
|
110
|
+
xml.titleInfo(:lang=>"") {
|
111
|
+
xml.title
|
112
|
+
}
|
113
|
+
xml.name(:type=>"personal") {
|
114
|
+
xml.namePart(:type=>"given")
|
115
|
+
xml.namePart(:type=>"family")
|
116
|
+
xml.affiliation
|
117
|
+
xml.computing_id
|
118
|
+
xml.description
|
119
|
+
xml.role {
|
120
|
+
xml.roleTerm("Author", :authority=>"marcrelator", :type=>"text")
|
121
|
+
}
|
122
|
+
}
|
123
|
+
}
|
124
|
+
end
|
125
|
+
return builder.doc
|
126
|
+
end
|
127
|
+
|
128
|
+
end
|
129
|
+
</pre>
|
130
|
+
|
131
|
+
OM::XML::Document provides the set_terminology method to handle the details of creating a TerminologyBuilder and building the terminology for you. This allows you to focus on defining the structures of the Terminology itself.
|
132
|
+
|
133
|
+
h3. Creating XML Documents from Scratch
|
134
|
+
|
135
|
+
<pre>
|
136
|
+
require "my_mods_document"
|
137
|
+
newdoc = MyModsDocument.new
|
138
|
+
newdoc.to_xml
|
139
|
+
=> NoMethodError: undefined method `to_xml' for nil:NilClass
|
140
|
+
</pre>
|
141
|
+
|
142
|
+
By default, new OM Document instances will create an empty xml document. However, if you set self.xml_template to return a different "Nokogiri::XML::Document":http://nokogiri.rubyforge.org/nokogiri/Nokogiri/XML/Document.html, that will be used instead.
|
143
|
+
|
144
|
+
In the example above, we have overridden xml_template to use "Nokogiri::XML::Builder":http://nokogiri.rubyforge.org/nokogiri/Nokogiri/XML/Builder.html to build an empty, relatively simple MODS document. Note that at the end of the definition for xml_template, we call .doc on that XML Builder to return the "Nokogiri::XML::Document":http://nokogiri.rubyforge.org/nokogiri/Nokogiri/XML/Document.html object. This is important because you need xml_template to return a "Nokogiri::XML::Document":http://nokogiri.rubyforge.org/nokogiri/Nokogiri/XML/Document.html. Instead of using "Nokogiri::XML::Builder":http://nokogiri.rubyforge.org/nokogiri/Nokogiri/XML/Builder.html, you could put your template into an actual xml file and have xml_template use "Nokogiri::XML::Document.parse":http://nokogiri.rubyforge.org/nokogiri/Nokogiri/XML/Document.html#M000225 to load it. That's up to you. Create the documents however you want, just return a "Nokogiri::XML::Document":http://nokogiri.rubyforge.org/nokogiri/Nokogiri/XML/Document.html.
|
145
|
+
|
146
|
+
|
147
|
+
h3. Loading an existing XML document
|
148
|
+
|
149
|
+
To load existing XML into your OM Document, use "#from_xml":OM/XML/Container/ClassMethods.html#from_xml-instance_method" }
|
150
|
+
|
151
|
+
Download "hydrangea_article1.xml":https://github.com/mediashelf/om/blob/master/spec/fixtures/mods_articles/hydrangea_article1.xml into your working directory, then run this:
|
152
|
+
|
153
|
+
<pre>
|
154
|
+
sample_xml = File.new("hydrangea_article1.xml")
|
155
|
+
doc = MyModsDocument.from_xml(sample_xml)
|
156
|
+
</pre>
|
157
|
+
|
158
|
+
Now take a look at the document you've loaded. We will use this document for the next few examples.
|
159
|
+
|
160
|
+
<pre>doc.to_xml</pre>
|
161
|
+
|
162
|
+
h3. Directly accessing the "Nokogiri::XML::Document":http://nokogiri.rubyforge.org/nokogiri/Nokogiri/XML/Document.html and the "OM::XML::Terminology":https://github.com/mediashelf/om/blob/master/lib/om/xml/terminology.rb
|
163
|
+
|
164
|
+
"OM::XML::Document":https://github.com/mediashelf/om/blob/master/lib/om/xml/document.rb is implemented as a container for a "Nokogiri::XML::Document":http://nokogiri.rubyforge.org/nokogiri/Nokogiri/XML/Document.html. It uses the associated Terminology to provide a bunch of convenience methods that wrap calls to Nokogiri. If you ever need to operate directly on the Nokogiri Document, simply call ng_xml and do what you need to do. OM will not get in your way.
|
165
|
+
|
166
|
+
<pre>ng_document = doc.ng_xml</pre>
|
167
|
+
|
168
|
+
If you need to look at the Terminology associated with your Document, call "#terminology":OM/XML/Document/ClassMethods.html#terminology-instance_method on the _class_.
|
169
|
+
|
170
|
+
<pre>
|
171
|
+
MyModsDocument.terminology
|
172
|
+
doc.class.terminology
|
173
|
+
</pre>
|
174
|
+
|
175
|
+
h3. Querying OM Documents
|
176
|
+
|
177
|
+
Using the Terminology associated with your Document, you can query the xml for Nodes _or_ node values without ever writing a line of XPath.
|
178
|
+
|
179
|
+
You can use OM::XML::Document.find_by_terms to retrieve xml nodes from the datastream. It returns Nokogiri::XML::Node objects.
|
180
|
+
|
181
|
+
<pre>
|
182
|
+
doc.find_by_terms(:person)
|
183
|
+
doc.find_by_terms(:person).length
|
184
|
+
doc.find_by_terms(:person).each {|n| puts n.to_xml}
|
185
|
+
</pre>
|
186
|
+
|
187
|
+
If you want to get directly to the _values_ within those nodes, use OM::XML::Document.term_values
|
188
|
+
|
189
|
+
<pre>
|
190
|
+
doc.term_values(:person, :given_name)
|
191
|
+
doc.term_values(:person, :family_name)
|
192
|
+
</pre>
|
193
|
+
|
194
|
+
If the xpath points to XML nodes that contain other nodes, the response to term_values will contain Nokogiri::XML::Node objects instead of text values.
|
195
|
+
|
196
|
+
<pre>
|
197
|
+
doc.term_values(:name)
|
198
|
+
</pre>
|
199
|
+
|
200
|
+
For more examples of Querying OM Documents, see "Querying Documents":https://github.com/mediashelf/om/blob/master/QUERYING_DOCUMENTS.textile
|
201
|
+
|
202
|
+
h3. Updating, Inserting & Deleting Elements (TermValueOperators)
|
203
|
+
|
204
|
+
For more examples of Updating OM Documents, see "Updating Documents":https://github.com/mediashelf/om/blob/master/UPDATING_DOCUMENTS.textile
|
205
|
+
|
206
|
+
h3. Validating Documents
|
207
|
+
|
208
|
+
If you have a schema defined in your Terminology's root Term, you can validate any xml document by calling ".validate" on any instance of your Document classes.
|
209
|
+
|
210
|
+
<pre>doc.validate</pre>
|
211
|
+
|
212
|
+
__*Note:* this method requires an internet connection, as it will download the schema from the URL you have specified in the Terminology's root term.__
|
213
|
+
|
214
|
+
h3. Solrizing Documents
|
215
|
+
|
216
|
+
The solrizer gem provides support for indexing XML documents into Solr based on OM Terminologies. That process is documented in the "solrizer documentation":http://hudson.projecthydra.org/job/solrizer/Documentation/file.SOLRIZING_OM_DOCUMENTS.html
|
data/Gemfile
CHANGED
data/Gemfile.lock
CHANGED
@@ -1,13 +1,14 @@
|
|
1
1
|
PATH
|
2
2
|
remote: .
|
3
3
|
specs:
|
4
|
-
om (1.2.
|
4
|
+
om (1.2.4)
|
5
5
|
nokogiri (>= 1.4.2)
|
6
6
|
om
|
7
7
|
|
8
8
|
GEM
|
9
9
|
remote: http://rubygems.org/
|
10
10
|
specs:
|
11
|
+
RedCloth (4.2.7)
|
11
12
|
columnize (0.3.2)
|
12
13
|
equivalent-xml (0.2.6)
|
13
14
|
nokogiri (>= 1.4.3)
|
@@ -20,20 +21,25 @@ GEM
|
|
20
21
|
mocha (0.9.12)
|
21
22
|
nokogiri (1.4.4)
|
22
23
|
rake (0.8.7)
|
24
|
+
rcov (0.9.9)
|
23
25
|
rspec (1.3.1)
|
24
26
|
ruby-debug (0.10.4)
|
25
27
|
columnize (>= 0.1)
|
26
28
|
ruby-debug-base (~> 0.10.4.0)
|
27
29
|
ruby-debug-base (0.10.4)
|
28
30
|
linecache (>= 0.3)
|
31
|
+
yard (0.6.8)
|
29
32
|
|
30
33
|
PLATFORMS
|
31
34
|
ruby
|
32
35
|
|
33
36
|
DEPENDENCIES
|
37
|
+
RedCloth
|
34
38
|
equivalent-xml (>= 0.2.4)
|
35
39
|
jeweler
|
36
40
|
mocha (>= 0.9.8)
|
37
41
|
om!
|
42
|
+
rcov
|
38
43
|
rspec (< 2.0.0)
|
39
44
|
ruby-debug
|
45
|
+
yard
|
data/History.textile
CHANGED
@@ -0,0 +1,134 @@
|
|
1
|
+
h2. Querying OM Documents
|
2
|
+
|
3
|
+
This document gives you some exposure to the methods provided by the "OM::XML::Document":OM/XML/Document.html module and its related modules "OM::XML::TermXPathGenerator":OM/XML/TermXPathGenerator.html & "OM::XML::TermValueOperators":OM/XML/TermValueOperators.html
|
4
|
+
|
5
|
+
_Note: In your code, don't worry about including OM::XML::TermXPathGenerator and OM::XML::TermValueOperators into your classes. OM::XML::Document handles that for you._
|
6
|
+
|
7
|
+
h3. Load the Sample XML and Sample Terminology
|
8
|
+
|
9
|
+
These examples use the Document class defined in "OM::Samples::ModsArticle":https://github.com/mediashelf/om/blob/master/lib/om/samples/mods_article.rb
|
10
|
+
|
11
|
+
Download "hydrangea_article1.xml":https://github.com/mediashelf/om/blob/master/spec/fixtures/mods_articles/hydrangea_article1.xml into your working directory, then run this:
|
12
|
+
|
13
|
+
<pre>
|
14
|
+
require "om/samples"
|
15
|
+
sample_xml = File.new("hydrangea_article1.xml")
|
16
|
+
doc = OM::Samples::ModsArticle.from_xml(sample_xml)
|
17
|
+
</pre>
|
18
|
+
|
19
|
+
h3. Query the Document
|
20
|
+
|
21
|
+
The OM Terminology declared by OM::Samples::ModsArticle handles generating xpath queries based on the structures you've defined. It will also run the queries for you in most cases. If you're ever curious what the xpath queries are, or if you want to use them in some other way, they are a few keystrokes away.
|
22
|
+
|
23
|
+
Here are the xpaths for :name and two variants of :name that were created using the :ref argument in the Terminology builder.
|
24
|
+
|
25
|
+
<pre>
|
26
|
+
OM::Samples::ModsArticle.terminology.xpath_for(:name)
|
27
|
+
=> "//oxns:name"
|
28
|
+
OM::Samples::ModsArticle.terminology.xpath_for(:person)
|
29
|
+
=> "//oxns:name[@type=\"personal\"]"
|
30
|
+
OM::Samples::ModsArticle.terminology.xpath_for(:organization)
|
31
|
+
=> "//oxns:name[@type=\"corporate\"]"
|
32
|
+
</pre>
|
33
|
+
|
34
|
+
To retrieve the values of xml nodes, use the term_values method
|
35
|
+
|
36
|
+
<pre>
|
37
|
+
doc.term_values(:person, :first_name)
|
38
|
+
doc.term_values(:person, :last_name)
|
39
|
+
</pre>
|
40
|
+
|
41
|
+
If the xpath points to XML nodes that contain other nodes, the response to term_values will contain Nokogiri::XML::Node objects instead of text values.
|
42
|
+
|
43
|
+
<pre>
|
44
|
+
doc.term_values(:name)
|
45
|
+
</pre>
|
46
|
+
|
47
|
+
More examples of using term_values and find_by_terms:
|
48
|
+
|
49
|
+
<pre>
|
50
|
+
doc.find_by_terms(:organization).to_xml
|
51
|
+
doc.term_values(:organization, :role)
|
52
|
+
=> ["\n Funder\n "]
|
53
|
+
doc.term_values(:organization, :namePart)
|
54
|
+
=> ["NSF"]
|
55
|
+
</pre>
|
56
|
+
|
57
|
+
You will often string together a series of term names to point to what you want
|
58
|
+
|
59
|
+
<pre>
|
60
|
+
OM::Samples::ModsArticle.terminology.xpath_for(:journal, :issue, :pages, :start)
|
61
|
+
=> "//oxns:relatedItem[@type=\"host\"]/oxns:part/oxns:extent[@unit=\"pages\"]/oxns:start"
|
62
|
+
doc.term_values(:journal, :issue, :pages, :start)
|
63
|
+
=> ["195"]
|
64
|
+
</pre>
|
65
|
+
|
66
|
+
If you get one of the names wrong in the list, OM will tell you which one is causing problems. See what happens when you put :page instead of :pages in your argument to term_values.
|
67
|
+
|
68
|
+
<pre>
|
69
|
+
doc.term_values(:journal, :issue, :page, :start)
|
70
|
+
OM::XML::Terminology::BadPointerError: You attempted to retrieve a Term using this pointer: [:journal, :issue, :page] but no Term exists at that location. Everything is fine until ":page", which doesn't exist.
|
71
|
+
</pre>
|
72
|
+
|
73
|
+
If you use a term often and you're sick of typing all of those term names, you can define a proxy term. Here we have a proxy term called :start_page that saves you from having to remember the details of how MODS is structured.
|
74
|
+
|
75
|
+
<pre>
|
76
|
+
OM::Samples::ModsArticle.terminology.xpath_for(:journal, :issue, :start_page)
|
77
|
+
=> "//oxns:relatedItem[@type=\"host\"]/oxns:part/oxns:extent[@unit=\"pages\"]/oxns:start"
|
78
|
+
</pre>
|
79
|
+
|
80
|
+
Here is another proxy term. It proxies to [:journal,:origin_info,:issuance]
|
81
|
+
|
82
|
+
<pre>
|
83
|
+
OM::Samples::ModsArticle.terminology.xpath_for(:peer_reviewed)
|
84
|
+
=> "//oxns:relatedItem[@type=\"host\"]/oxns:originInfo/oxns:issuance"
|
85
|
+
</pre>
|
86
|
+
|
87
|
+
h2. What to do when elements are reused throughout an XML document
|
88
|
+
|
89
|
+
In our MODS document, we have two types of title: 1) the title of the published article and 2) the title of the journal it was published in. They both use the same xml node. How can we deal with that?
|
90
|
+
|
91
|
+
<pre>
|
92
|
+
doc.term_values(:title_info, :main_title)
|
93
|
+
=> ["ARTICLE TITLE", "VARYING FORM OF TITLE", "TITLE OF HOST JOURNAL"]
|
94
|
+
doc.term_values(:mods, :title_info, :main_title)
|
95
|
+
=> ["ARTICLE TITLE", "VARYING FORM OF TITLE"]
|
96
|
+
OM::Samples::ModsArticle.terminology.xpath_for(:title_info, :main_title)
|
97
|
+
=> "//oxns:titleInfo/oxns:title"
|
98
|
+
</pre>
|
99
|
+
|
100
|
+
The solution: include the root node in your term pointer.
|
101
|
+
|
102
|
+
<pre>
|
103
|
+
OM::Samples::ModsArticle.terminology.xpath_for(:mods, :title_info, :main_title)
|
104
|
+
=> "//oxns:mods/oxns:titleInfo/oxns:title"
|
105
|
+
doc.term_values(:mods, :title_info, :main_title)
|
106
|
+
=> ["ARTICLE TITLE", "VARYING FORM OF TITLE"]
|
107
|
+
</pre>
|
108
|
+
|
109
|
+
We can still access the Journal title by its own pointers:
|
110
|
+
|
111
|
+
<pre>
|
112
|
+
doc.term_values(:journal, :title_info, :main_title)
|
113
|
+
=> ["TITLE OF HOST JOURNAL"]
|
114
|
+
</pre>
|
115
|
+
|
116
|
+
h2. Making life easier with Proxy Terms
|
117
|
+
|
118
|
+
Sometimes all of these terms become tedious. That's where proxy terms come in. You can use them to access frequently used Terms more easily. As you can see in "OM::Samples::ModsArticle":https://github.com/mediashelf/om/blob/master/lib/om/samples/mods_article.rb, we have defined a few proxy terms for convenience.
|
119
|
+
|
120
|
+
<pre>
|
121
|
+
t.publication_url(:proxy=>[:location,:url])
|
122
|
+
t.peer_reviewed(:proxy=>[:journal,:origin_info,:issuance], :index_as=>[:facetable])
|
123
|
+
t.title(:proxy=>[:mods,:title_info, :main_title])
|
124
|
+
t.journal_title(:proxy=>[:journal, :title_info, :main_title])
|
125
|
+
</pre>
|
126
|
+
|
127
|
+
You can use them just like any other Term when querying the document.
|
128
|
+
|
129
|
+
<pre>
|
130
|
+
OM::Samples::ModsArticle.terminology.xpath_for(:title)
|
131
|
+
=> "//oxns:mods/oxns:titleInfo/oxns:title"
|
132
|
+
OM::Samples::ModsArticle.terminology.xpath_for(:journal_title)
|
133
|
+
=> "//oxns:relatedItem[@type=\"host\"]/oxns:titleInfo/oxns:title"
|
134
|
+
</pre>
|
data/README.textile
CHANGED
@@ -1,16 +1,25 @@
|
|
1
|
-
h1.
|
1
|
+
h1. om (Optinionated Metadata)
|
2
2
|
|
3
3
|
A library to help you tame sprawling XML schemas like MODS.
|
4
4
|
|
5
|
-
h2.
|
6
|
-
|
7
|
-
*
|
8
|
-
*
|
9
|
-
*
|
10
|
-
|
11
|
-
|
12
|
-
|
13
|
-
|
5
|
+
h2. Tutorials
|
6
|
+
|
7
|
+
* "Getting Started":http://hudson.projecthydra.org/job/om/Documentation/file.GETTING_STARTED.html
|
8
|
+
* "Querying Documents":http://hudson.projecthydra.org/job/om/Documentation/file.QUERYING_DOCUMENTS.html
|
9
|
+
* "Updating Documents":http://hudson.projecthydra.org/job/om/Documentation/file.UPDATING_DOCUMENTS.html
|
10
|
+
* "Getting Fancy":http://hudson.projecthydra.org/job/om/Documentation/file.GETTING_FANCY.html
|
11
|
+
|
12
|
+
h2. Common OM Patterns
|
13
|
+
|
14
|
+
"Common OM Patterns":http://hudson.projecthydra.org/job/om/Documentation/file.COMMON_OM_PATTERNS.html
|
15
|
+
|
16
|
+
h3. Solrizing Documents
|
17
|
+
|
18
|
+
The solrizer gem provides support for indexing XML documents into Solr based on OM Terminologies. That process is documented in the "solrizer documentation":http://hudson.projecthydra.org/job/solrizer/Documentation/file.SOLRIZING_OM_DOCUMENTS.html
|
19
|
+
|
20
|
+
h2. OM in the Wild
|
21
|
+
|
22
|
+
We have a page on the Hydra wiki with a list of OM Terminologies in active use: "OM Terminologies in the Wild":https://wiki.duraspace.org/display/hydra/OM+Terminologies+in+the+Wild
|
14
23
|
|
15
24
|
h2. Acknowledgements
|
16
25
|
|