rdf-microdata 0.1.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
data/AUTHORS ADDED
@@ -0,0 +1 @@
1
+ * Gregg Kellogg <gregg@kellogg-assoc.com>
data/README ADDED
@@ -0,0 +1,80 @@
1
+ # RDF::Microdata reader/writer
2
+
3
+ [Microdata][] parser for RDF.rb.
4
+
5
+ ## DESCRIPTION
6
+ RDF::Microdata is a Microdata reader for Ruby using the [RDF.rb][RDF.rb] library suite.
7
+
8
+ ## FEATURES
9
+ RDF::Microdata parses [Microdata][] into statements or triples.
10
+
11
+ * Microdata parser.
12
+ * Uses Nokogiri for parsing HTML
13
+
14
+ Install with 'gem install rdf-microdata'
15
+
16
+ ## Usage
17
+
18
+ ### Reading RDF data in the RDFa format
19
+
20
+ graph = RDF::Graph.load("etc/foaf.html", :format => :microdata)
21
+
22
+ ## Dependencies
23
+ * [RDF.rb](http://rubygems.org/gems/rdf) (>= 0.3.3)
24
+ * [Nokogiri](http://rubygems.org/gems/nokogiri) (>= 1.3.3)
25
+
26
+ ## Documentation
27
+ Full documentation available on [RubyForge](http://rdf.rubyforge.org/microdata)
28
+
29
+ ### Principle Classes
30
+ * {RDF::Microdata::Format}
31
+ * {RDF::Microdata::HTML}
32
+ Asserts :html format, text/html mime-type and .html file extension.
33
+ * {RDF::RDFa::Reader}
34
+
35
+ ### Additional vocabularies
36
+
37
+ ## TODO
38
+ * Add support for LibXML and REXML bindings, and use the best available
39
+ * Consider a SAX-based parser for improved performance
40
+
41
+ ## Resources
42
+ * [RDF.rb][RDF.rb]
43
+ * [Documentation](http://rdf.rubyforge.org/microdata)
44
+ * [History](file:file.History.html)
45
+ * [Microdata][]
46
+
47
+ ## Author
48
+ * [Gregg Kellogg](http://github.com/gkellogg) - <http://kellogg-assoc.com/>
49
+
50
+ ## Contributing
51
+
52
+ * Do your best to adhere to the existing coding conventions and idioms.
53
+ * Don't use hard tabs, and don't leave trailing whitespace on any line.
54
+ * Do document every method you add using [YARD][] annotations. Read the
55
+ [tutorial][YARD-GS] or just look at the existing code for examples.
56
+ * Don't touch the `.gemspec`, `VERSION` or `AUTHORS` files. If you need to
57
+ change them, do so on your private branch only.
58
+ * Do feel free to add yourself to the `CREDITS` file and the corresponding
59
+ list in the the `README`. Alphabetical order applies.
60
+ * Do note that in order for us to merge any non-trivial changes (as a rule
61
+ of thumb, additions larger than about 15 lines of code), we need an
62
+ explicit [public domain dedication][PDD] on record from you.
63
+
64
+ ## License
65
+
66
+ This is free and unencumbered public domain software. For more information,
67
+ see <http://unlicense.org/> or the accompanying {file:UNLICENSE} file.
68
+
69
+ ## FEEDBACK
70
+
71
+ * gregg@kellogg-assoc.com
72
+ * <http://rubygems.org/rdf-microdata>
73
+ * <http://github.com/gkellogg/rdf-microdata>
74
+ * <http://lists.w3.org/Archives/Public/public-rdf-ruby/>
75
+
76
+ [RDF.rb]: http://rdf.rubyforge.org/
77
+ [YARD]: http://yardoc.org/
78
+ [YARD-GS]: http://rubydoc.info/docs/yard/file/docs/GettingStarted.md
79
+ [PDD]: http://lists.w3.org/Archives/Public/public-rdf-ruby/2010May/0013.html
80
+ [Microdata]: http://www.w3.org/TR/2011/WD-microdata-20110525/ "HTML Microdata"
data/UNLICENSE ADDED
@@ -0,0 +1,24 @@
1
+ This is free and unencumbered software released into the public domain.
2
+
3
+ Anyone is free to copy, modify, publish, use, compile, sell, or
4
+ distribute this software, either in source code form or as a compiled
5
+ binary, for any purpose, commercial or non-commercial, and by any
6
+ means.
7
+
8
+ In jurisdictions that recognize copyright laws, the author or authors
9
+ of this software dedicate any and all copyright interest in the
10
+ software to the public domain. We make this dedication for the benefit
11
+ of the public at large and to the detriment of our heirs and
12
+ successors. We intend this dedication to be an overt act of
13
+ relinquishment in perpetuity of all present and future rights to this
14
+ software under copyright law.
15
+
16
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
17
+ EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
18
+ MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.
19
+ IN NO EVENT SHALL THE AUTHORS BE LIABLE FOR ANY CLAIM, DAMAGES OR
20
+ OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
21
+ ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
22
+ OTHER DEALINGS IN THE SOFTWARE.
23
+
24
+ For more information, please refer to <http://unlicense.org/>
data/VERSION ADDED
@@ -0,0 +1 @@
1
+ 0.1.0
@@ -0,0 +1,34 @@
1
+ $:.unshift(File.expand_path(File.join(File.dirname(__FILE__), '..')))
2
+ require 'rdf'
3
+
4
+ module RDF
5
+ ##
6
+ # **`RDF::Microdata`** is a Microdata plugin for RDF.rb.
7
+ #
8
+ # @example Requiring the `RDF::Microdata` module
9
+ # require 'rdf/microdata'
10
+ #
11
+ # @example Parsing RDF statements from an HTML file
12
+ # RDF::Microdata::Reader.open("etc/foaf.html") do |reader|
13
+ # reader.each_statement do |statement|
14
+ # puts statement.inspect
15
+ # end
16
+ # end
17
+ #
18
+ # @see http://rdf.rubyforge.org/
19
+ # @see http://www.w3.org/TR/2011/WD-microdata-20110525/
20
+ #
21
+ # @author [Gregg Kellogg](http://kellogg-assoc.com/)
22
+ module Microdata
23
+ require 'rdf/microdata/format'
24
+ require 'rdf/microdata/vocab'
25
+ autoload :Profile, 'rdf/microdata/profile'
26
+ autoload :Reader, 'rdf/microdata/reader'
27
+ autoload :VERSION, 'rdf/microdata/version'
28
+
29
+ def self.debug?; @debug; end
30
+ def self.debug=(value); @debug = value; end
31
+ end
32
+ end
33
+
34
+ require 'rdf/microdata/extensions'
@@ -0,0 +1,34 @@
1
+ require 'nokogiri'
2
+ class Nokogiri::XML::Node
3
+ ##
4
+ # Language, taken recursively from element and ancestors
5
+ def language
6
+ @lang ||= attribute('lang') ||
7
+ attributes["lang"] ||
8
+ attributes["xml:lang"] ||
9
+ (parent && parent.element? && parent.language)
10
+ end
11
+
12
+ ##
13
+ # Get any xml:base in effect for this element
14
+ def base
15
+ if @base.nil?
16
+ @base = attributes['xml:base'] ||
17
+ (parent && parent.element? && parent.base) ||
18
+ false
19
+ end
20
+
21
+ @base == false ? nil : @base
22
+ end
23
+
24
+ def display_path
25
+ @display_path ||= case self
26
+ when Nokogiri::XML::Document then ""
27
+ when Nokogiri::XML::Element then parent ? "#{parent.display_path}/#{name}" : name
28
+ when Nokogiri::XML::Attr then "#{parent.display_path}@#{name}"
29
+ end
30
+ end
31
+ end
32
+
33
+ class Nokogiri::XML::Document
34
+ end
@@ -0,0 +1,21 @@
1
+ module RDF::Microdata
2
+ ##
3
+ # Microdata format specification.
4
+ #
5
+ # @example Obtaining a Microdata format class
6
+ # RDF::Format.for(:microdata) #=> RDF::Microdata::Format
7
+ # RDF::Format.for("etc/foaf.html")
8
+ # RDF::Format.for(:file_name => "etc/foaf.html")
9
+ # RDF::Format.for(:file_extension => "html")
10
+ # RDF::Format.for(:content_type => "text/html")
11
+ #
12
+ # @example Obtaining serialization format MIME types
13
+ # RDF::Format.content_types #=> {"text/html" => [RDF::Microdata::Format]}
14
+ #
15
+ # @see http://www.w3.org/TR/rdf-testcases/#ntriples
16
+ class Format < RDF::Format
17
+ content_encoding 'utf-8'
18
+ content_type 'text/html', :extension => :html
19
+ reader { RDF::Microdata::Reader }
20
+ end
21
+ end
@@ -0,0 +1,488 @@
1
+ require 'nokogiri' # FIXME: Implement using different modules as in RDF::TriX
2
+
3
+ module RDF::Microdata
4
+ ##
5
+ # An Microdata parser in Ruby
6
+ #
7
+ # Based on processing rules described here:
8
+ # @see http://dev.w3.org/html5/md/
9
+ #
10
+ # @author [Gregg Kellogg](http://kellogg-assoc.com/)
11
+ class Reader < RDF::Reader
12
+ format Format
13
+ XHTML = "http://www.w3.org/1999/xhtml"
14
+ URL_PROPERTY_ELEMENTS = %w(a area audio embed iframe img link object source track video)
15
+
16
+ class CrawlFailure < StandardError #:nodoc:
17
+ end
18
+
19
+ ##
20
+ # Initializes the Microdata reader instance.
21
+ #
22
+ # @param [Nokogiri::HTML::Document, Nokogiri::XML::Document, IO, File, String] input
23
+ # the input stream to read
24
+ # @param [Hash{Symbol => Object}] options
25
+ # any additional options
26
+ # @option options [Encoding] :encoding (Encoding::UTF_8)
27
+ # the encoding of the input stream (Ruby 1.9+)
28
+ # @option options [Boolean] :validate (false)
29
+ # whether to validate the parsed statements and values
30
+ # @option options [Boolean] :canonicalize (false)
31
+ # whether to canonicalize parsed literals
32
+ # @option options [Boolean] :intern (true)
33
+ # whether to intern all parsed URIs
34
+ # @option options [#to_s] :base_uri (nil)
35
+ # the base URI to use when resolving relative URIs
36
+ # @option options [Array] :debug
37
+ # Array to place debug messages
38
+ # @return [reader]
39
+ # @yield [reader] `self`
40
+ # @yieldparam [RDF::Reader] reader
41
+ # @yieldreturn [void] ignored
42
+ # @raise [Error]:: Raises RDF::ReaderError if _validate_
43
+ def initialize(input = $stdin, options = {}, &block)
44
+ super do
45
+ @debug = options[:debug]
46
+
47
+ @doc = case input
48
+ when Nokogiri::HTML::Document, Nokogiri::XML::Document
49
+ input
50
+ else
51
+ # Try to detect charset from input
52
+ options[:encoding] ||= input.charset if input.respond_to?(:charset)
53
+
54
+ # Otherwise, default is utf-8
55
+ options[:encoding] ||= 'utf-8'
56
+
57
+ Nokogiri::HTML.parse(input, @base_uri.to_s, options[:encoding])
58
+ end
59
+
60
+ if (@doc.nil? || @doc.root.nil?)
61
+ add_error(nil, "Empty document")
62
+ raise RDF::ReaderError, "Empty Document"
63
+ end
64
+ errors = @doc.errors.reject {|e| e.to_s =~ /Tag (audio|source|track|video|time) invalid/}
65
+ add_error(nil, "Synax errors:\n#{@doc.errors}") if !errors.empty? && validate?
66
+
67
+ block.call(self) if block_given?
68
+ end
69
+ end
70
+
71
+ ##
72
+ # Iterates the given block for each RDF statement in the input.
73
+ #
74
+ # @yield [statement]
75
+ # @yieldparam [RDF::Statement] statement
76
+ # @return [void]
77
+ def each_statement(&block)
78
+ @callback = block
79
+
80
+ # parse
81
+ parse_whole_document(@doc, @base_uri)
82
+ end
83
+
84
+ ##
85
+ # Iterates the given block for each RDF triple in the input.
86
+ #
87
+ # @yield [subject, predicate, object]
88
+ # @yieldparam [RDF::Resource] subject
89
+ # @yieldparam [RDF::URI] predicate
90
+ # @yieldparam [RDF::Value] object
91
+ # @return [void]
92
+ def each_triple(&block)
93
+ each_statement do |statement|
94
+ block.call(*statement.to_triple)
95
+ end
96
+ end
97
+
98
+ private
99
+
100
+ # Keep track of allocated BNodes
101
+ def bnode(value = nil)
102
+ @bnode_cache ||= {}
103
+ @bnode_cache[value.to_s] ||= RDF::Node.new(value)
104
+ end
105
+
106
+ # Figure out the document path, if it is a Nokogiri::XML::Element or Attribute
107
+ def node_path(node)
108
+ "<#{@base_uri}>" + case node
109
+ when Nokogiri::XML::Node then node.display_path
110
+ else node.to_s
111
+ end
112
+ end
113
+
114
+ # Add debug event to debug array, if specified
115
+ #
116
+ # @param [XML Node, any] node:: XML Node or string for showing context
117
+ # @param [String] message::
118
+ def add_debug(node, message)
119
+ puts "#{node_path(node)}: #{message}" if ::RDF::Microdata::debug?
120
+ @debug << "#{node_path(node)}: #{message}" if @debug.is_a?(Array)
121
+ end
122
+
123
+ def add_error(node, message)
124
+ add_debug(node, message)
125
+ raise RDF::ReaderError, message if validate?
126
+ end
127
+
128
+ # add a statement, object can be literal or URI or bnode
129
+ #
130
+ # @param [Nokogiri::XML::Node, any] node:: XML Node or string for showing context
131
+ # @param [URI, BNode] subject:: the subject of the statement
132
+ # @param [URI] predicate:: the predicate of the statement
133
+ # @param [URI, BNode, Literal] object:: the object of the statement
134
+ # @return [Statement]:: Added statement
135
+ # @raise [ReaderError]:: Checks parameter types and raises if they are incorrect if parsing mode is _validate_.
136
+ def add_triple(node, subject, predicate, object)
137
+ statement = RDF::Statement.new(subject, predicate, object)
138
+ add_debug(node, "statement: #{RDF::NTriples.serialize(statement)}")
139
+ @callback.call(statement)
140
+ end
141
+
142
+ # Parsing an RDFa document (this is *not* the recursive method)
143
+ def parse_whole_document(doc, base)
144
+ base_el = doc.at_css('html>head>base')
145
+ base = base_el.attribute('href').to_s.split('#').first if base_el
146
+
147
+ if (base)
148
+ # Strip any fragment from base
149
+ base = base.to_s.split('#').first
150
+ base = @options[:base_uri] = uri(base)
151
+ add_debug(base_el, "parse_whole_doc: base='#{base}'")
152
+ else
153
+ base = RDF::URI("")
154
+ end
155
+
156
+ ##
157
+ # 1. If the title element is not null, then generate the following triple:
158
+ #
159
+ # subject: the document's current address
160
+ # predicate: http://purl.org/dc/terms/title
161
+ # object: the concatenation of the data of all the child text nodes of the title element,
162
+ # in tree order, as a plain literal, with the language information set from
163
+ # the language of the title element, if it is not unknown.
164
+ doc.css('html>head>title').each do |title|
165
+ lang = title.attribute('language')
166
+ add_triple(title, base, RDF::DC.title, title.inner_text)
167
+ end
168
+
169
+ # 2. For each a, area, and link element in the Document, run these substeps:
170
+ #
171
+ # * If the element does not have a rel attribute, then skip this element.
172
+ # * If the element does not have an href attribute, then skip this element.
173
+ # * If resolving the element's href attribute relative to the element is not successful,
174
+ # then skip this element.
175
+ doc.css('a, area, link').each do |el|
176
+ rel, href = el.attribute('rel'), el.attribute('href')
177
+ next unless rel && href
178
+ href = uri(href, el.base || base)
179
+ add_debug(el, "a: rel=#{rel.inspect}, href=#{href}")
180
+
181
+ # Otherwise, split the value of the element's rel attribute on spaces, obtaining list of tokens.
182
+ # Coalesce duplicate tokens in list of tokens.
183
+ tokens = rel.to_s.split(/\s+/).map do |tok|
184
+ # Convert each token in list of tokens that does not contain a U+003A COLON characters (:)
185
+ # to ASCII lowercase.
186
+ tok =~ /:/ ? tok : tok.downcase
187
+ end.uniq
188
+
189
+ # If list of tokens contains both the tokens alternate and stylesheet,
190
+ # then remove them both and replace them with the single (uppercase) token
191
+ # ALTERNATE-STYLESHEET.
192
+ if tokens.include?('alternate') && tokens.include?('stylesheet')
193
+ tokens = tokens - %w(alternate stylesheet)
194
+ tokens << 'ALTERNATE-STYLESHEET'
195
+ end
196
+
197
+ tokens.each do |tok|
198
+ tok_uri = RDF::URI(tok)
199
+ if tok !~ /:/
200
+ # For each token token in list of tokens that contains no U+003A COLON characters (:),
201
+ # generate the following triple:
202
+ add_triple(el, base, RDF::XHV[tok.gsub('#', '%23')], href)
203
+ elsif tok_uri.absolute?
204
+ # For each token token in list of tokens that is an absolute URL, generate the following triple:
205
+ add_triple(el, base, tok_uri, href)
206
+ end
207
+ end
208
+ end
209
+
210
+ # 3. For each meta element in the Document that has a name attribute and a content attribute,
211
+ doc.css('meta[name][content]').each do |el|
212
+ name, content = el.attribute('name'), el.attribute('content')
213
+ name = name.to_s
214
+ name_uri = uri(name, el.base || base)
215
+ add_debug(el, "meta: name=#{name.inspect}")
216
+ if name !~ /:/
217
+ # If the value of the name attribute contains no U+003A COLON characters (:),
218
+ # generate the following triple:
219
+ add_triple(el, base, RDF::XHV[name.downcase.gsub('#', '%23')], RDF::Literal(content, :language => el.language))
220
+ elsif name_uri.absolute?
221
+ # If the value of the name attribute contains no U+003A COLON characters (:),
222
+ # generate the following triple:
223
+ add_triple(el, base, name_uri, RDF::Literal(content, :language => el.language))
224
+ end
225
+ end
226
+
227
+ # 4. For each blockquote and q element in the Document that has a cite attribute that resolves
228
+ # successfully relative to the element, generate the following triple:
229
+ doc.css('blockquote[cite], q[cite]').each do |el|
230
+ object = uri(el.attribute('cite'), el.base || base)
231
+ add_debug(el, "blockquote: cite=#{object}")
232
+ add_triple(el, base, RDF::DC.source, object)
233
+ end
234
+
235
+
236
+ # 5. Let memory be a mapping of items to subjects, initially empty.
237
+ # 6. For each element that is also a top-level microdata item, run the following steps:
238
+ # * Generate the triples for the item. Pass a reference to memory as the item/subject list.
239
+ # Let result be the subject returned.
240
+ # * Generate the following triple:
241
+ # subject the document's current address
242
+ # predicate http://www.w3.org/1999/xhtml/microdata#item
243
+ # object result
244
+ memory = {}
245
+ doc.css('[itemscope]').
246
+ select {|el| !el.has_attribute?('itemprop')}.
247
+ each do |el|
248
+ object = generate_triples(el, memory)
249
+ add_triple(el, base, RDF::MD.item, object)
250
+ end
251
+
252
+ add_debug(doc, "parse_whole_doc: traversal complete")
253
+ end
254
+
255
+ ##
256
+ # Generate triples for an item
257
+ # @param [RDF::Resource] item
258
+ # @param [Hash{Nokogiri::XML::Element} => RDF::Resource] memory
259
+ # @param [Hash{Symbol => Object}] options
260
+ # @option options [RDF::Resource] :fallback_type
261
+ # @option options [RDF::Resource] :fallback_name
262
+ # @return [RDF::Resource]
263
+ def generate_triples(item, memory, options = {})
264
+ fallback_type = options[:fallback_type]
265
+ fallback_name = options[:fallback_name]
266
+
267
+ # 1. If there is an entry for item in memory, then let subject be the subject of that entry.
268
+ # Otherwise, if item has a global identifier and that global identifier is an absolute URL,
269
+ # let subject be that global identifier. Otherwise, let subject be a new blank node.
270
+ subject = if memory.include?(item)
271
+ memory[item][:subject]
272
+ elsif item.has_attribute?('itemid')
273
+ u = uri(item.attribute('itemid'))
274
+ end || RDF::Node.new
275
+ memory[item] ||= {}
276
+
277
+ add_debug(item, "gentrips(2): subject=#{subject.inspect}")
278
+
279
+ # 2. Add a mapping from item to subject in memory, if there isn't one already.
280
+ memory[item][:subject] ||= subject
281
+
282
+ # 3. If item has an item type and that item type is an absolute URL, let type be that item type.
283
+ # Otherwise, let type be the empty string.
284
+ type = uri(item.attribute('itemtype'))
285
+ type = '' unless type.absolute?
286
+
287
+ if type != ''
288
+ add_triple(item, subject, RDF.type, type)
289
+ # 4.2. If type does not contain a U+0023 NUMBER SIGN character (#), then append a # to type.
290
+ type += '#' unless type.to_s.include?('#')
291
+ # 4.3. If type does not have a : after its #, append a : to type.
292
+ type += ':' unless type.to_s.match(/\#:/)
293
+ elsif fallback_type
294
+ add_debug(item, "gentrips(5.2): fallback_type=#{fallback_type}, fallback_name=#{fallback_name}")
295
+ type = fallback_type
296
+ # 5.2. If type does not contain a U+0023 NUMBER SIGN character (#), then append a # to type.
297
+ type += '#' unless type.to_s.include?('#')
298
+ # 5.3. If type does not have a : after its #, append a : to type.
299
+ type += ':' unless type.to_s.match(/\#:/)
300
+ # 5.4. If the last character of type is not a :, %20 to type.
301
+ type += '%20' unless type.to_s[-1] == ':'
302
+ # 5.5. Append the fragment-escaped value of fallback name to type.
303
+ type += fallback_name.to_s.gsub('#', '%23')
304
+ end
305
+
306
+ add_debug(item, "gentrips(6): type=#{type.inspect}")
307
+
308
+ # 6. For each element _element_ that has one or more property names and is one of the
309
+ # properties of the item _item_, in the order those elements are given by the algorithm
310
+ # that returns the properties of an item, run the following substep:
311
+ props = item_properties(item)
312
+
313
+ # 6.1. For each name name in element's property names, run the following substeps:
314
+ props.each do |element|
315
+ element.attribute('itemprop').to_s.split(' ').each do |name|
316
+ add_debug(element, "gentrips(6.1): name=#{name.inspect}")
317
+ # If type is the empty string and name is not an absolute URL, then abort these substeps.
318
+ name_uri = RDF::URI(name)
319
+ next if type == '' && !name_uri.absolute?
320
+
321
+ value = property_value(element)
322
+ add_debug(element, "gentrips(6.1.2) value=#{value.inspect}")
323
+
324
+ if value.is_a?(Hash)
325
+ value = generate_triples(element, memory, :fallback_type => type, :fallback_name => name)
326
+ end
327
+
328
+ add_debug(element, "gentrips(6.1.3): value=#{value.inspect}")
329
+
330
+ predicate = if name_uri.absolute?
331
+ name_uri
332
+ elsif !name.include?(':')
333
+ s = type.to_s
334
+ s += '%20' unless s[-1] == ':'
335
+ s += name
336
+ RDF::MD[s.gsub('#', '%23')]
337
+ end
338
+ add_debug(element, "gentrips(6.1.5): predicate=#{predicate}")
339
+
340
+ add_triple(element, subject, predicate, value) if predicate
341
+ end
342
+ end
343
+
344
+ subject
345
+ end
346
+
347
+ ##
348
+ # To find the properties of an item defined by the element root, the user agent must try
349
+ # to crawl the properties of the element root, with an empty list as the value of memory:
350
+ # if this fails, then the properties of the item defined by the element root is an empty
351
+ # list; otherwise, it is the returned list.
352
+ #
353
+ # @param [Nokogiri::XML::Element] item
354
+ # @return [Array<Nokogiri::XML::Element>]
355
+ # List of property elements for an item
356
+ def item_properties(item)
357
+ add_debug(item, "item_properties")
358
+ results, errors = crawl_properties(item, [])
359
+ raise CrawlFailure, "item_props: errors=#{errors}" if errors > 0
360
+ results
361
+ rescue CrawlFailure => e
362
+ add_error(element, e.message)
363
+ return []
364
+ end
365
+
366
+ ##
367
+ # To crawl the properties of an element root with a list memory, the user agent must run
368
+ # the following steps. These steps either fail or return a list with a count of errors.
369
+ # The count of errors is used as part of the authoring conformance criteria below.
370
+ #
371
+ # @param [Nokogiri::XML::Element] root
372
+ # @param [Array<Nokokogiri::XML::Element>] memory
373
+ # @return [Array<Array<Nokogiri::XML::Element>, Integer>]
374
+ # Resultant elements and error count
375
+ def crawl_properties(root, memory)
376
+ # 1. If root is in memory, then the algorithm fails; abort these steps.
377
+ raise CrawlFailure, "crawl_props mem already has #{root.inspect}" if memory.include?(root)
378
+
379
+ # 2. Collect all the elements in the item root; let results be the resulting
380
+ # list of elements, and errors be the resulting count of errors.
381
+ results, errors = elements_in_item(root)
382
+ add_debug(root, "crawl_properties results=#{results.inspect}, errors=#{errors}")
383
+
384
+ # 3. Remove any elements from results that do not have an itemprop attribute specified.
385
+ results = results.select {|e| e.has_attribute?('itemprop')}
386
+
387
+ # 4. Let new memory be a new list consisting of the old list memory with the addition of root.
388
+ new_memory = memory + [root]
389
+
390
+ # 5. For each element in results that has an itemscope attribute specified,
391
+ # crawl the properties of the element, with new memory as the memory.
392
+ results.select {|e| e.has_attribute?('itemscope')}.each do |element|
393
+ begin
394
+ crawl_properties(element, new_memory)
395
+ rescue CrawlFailure => e
396
+ # If this fails, then remove the element from results and increment errors.
397
+ # (If it succeeds, the return value is discarded.)
398
+ memory -= element
399
+ add_error(element, e.message)
400
+ errors += 1
401
+ end
402
+ end
403
+
404
+ [results, errors]
405
+ end
406
+
407
+ ##
408
+ # To collect all the elements in the item root, the user agent must run these steps.
409
+ # They return a list of elements and a count of errors.
410
+ #
411
+ # @param [Nokogiri::XML::Element] root
412
+ # @return [Array<Array<Nokogiri::XML::Element>, Integer>]
413
+ # Resultant elements and error count
414
+ def elements_in_item(root)
415
+ # Let results and pending be empty lists of elements.
416
+ # Let errors be zero.
417
+ results, errors = [], 0
418
+
419
+ # Add all the children elements of root to pending.
420
+ pending = root.elements
421
+
422
+ # If root has an itemref attribute, split the value of that itemref attribute on spaces.
423
+ # For each resulting token ID,
424
+ root.attribute('itemref').to_s.split(' ').each do |id|
425
+ add_debug(root, "elements_in_item itemref id #{id}")
426
+ # if there is an element in the home subtree of root with the ID ID,
427
+ # then add the first such element to pending.
428
+ id_elem = @doc.at_css("##{id}")
429
+ pending << id_elem if id_elem
430
+ end
431
+ add_debug(root, "elements_in_item pending #{pending.inspect}")
432
+
433
+ # Loop: Remove an element from pending and let current be that element.
434
+ while current = pending.shift
435
+ if results.include?(current)
436
+ # If current is already in results, increment errors.
437
+ add_error(current, "elements_in_item: results already includes #{current.inspect}")
438
+ errors += 1
439
+ elsif !current.has_attribute?('itemscope')
440
+ # If current is not already in results and current does not have an itemscope attribute,
441
+ # then: add all the child elements of current to pending.
442
+ pending += current.elements
443
+ end
444
+
445
+ # If current is not already in results, then: add current to results.
446
+ results << current unless results.include?(current)
447
+ end
448
+
449
+ [results, errors]
450
+ end
451
+
452
+ ##
453
+ #
454
+ def property_value(element)
455
+ add_debug(element, "property_value(#{element.inspect})")
456
+ case
457
+ when element.has_attribute?('itemscope')
458
+ {}
459
+ when element.name == 'meta'
460
+ element.attribute('content').to_s
461
+ when %w(audio embed iframe img source track video).include?(element.name)
462
+ uri(element.attribute('src'), element.base)
463
+ when %w(a area link).include?(element.name)
464
+ uri(element.attribute('href'), element.base)
465
+ when %w(object).include?(element.name)
466
+ uri(element.attribute('data'), element.base)
467
+ when %w(time).include?(element.name) && element.has_attribute?('datetime')
468
+ RDF::Literal::DateTime.new(element.attribute('datetime'))
469
+ else
470
+ RDF::Literal.new(element.text, :language => element.language)
471
+ end
472
+ end
473
+
474
+ # Fixme, what about xml:base relative to element?
475
+ def uri(value, base = nil)
476
+ value = if base
477
+ base = uri(base) unless base.is_a?(RDF::URI)
478
+ base.join(value)
479
+ else
480
+ RDF::URI(value)
481
+ end
482
+ value.validate! if validate?
483
+ value.canonicalize! if canonicalize?
484
+ value = RDF::URI.intern(value) if intern?
485
+ value
486
+ end
487
+ end
488
+ end
@@ -0,0 +1,18 @@
1
+ module RDF::Microdata::VERSION
2
+ VERSION_FILE = File.join(File.expand_path(File.dirname(__FILE__)), "..", "..", "..", "VERSION")
3
+ MAJOR, MINOR, TINY, EXTRA = File.read(VERSION_FILE).chop.split(".")
4
+
5
+ STRING = [MAJOR, MINOR, TINY, EXTRA].compact.join('.')
6
+
7
+ ##
8
+ # @return [String]
9
+ def self.to_s() STRING end
10
+
11
+ ##
12
+ # @return [String]
13
+ def self.to_str() STRING end
14
+
15
+ ##
16
+ # @return [Array(Integer, Integer, Integer)]
17
+ def self.to_a() STRING.split(".") end
18
+ end
@@ -0,0 +1,5 @@
1
+ module RDF
2
+ class MD < Vocabulary("http://www.w3.org/1999/xhtml/microdata#"); end
3
+ class Schema < Vocabulary("http://schema.org/"); end
4
+ class XHV < Vocabulary("http://www.w3.org/1999/xhtml/vocab#"); end
5
+ end
metadata ADDED
@@ -0,0 +1,141 @@
1
+ --- !ruby/object:Gem::Specification
2
+ name: rdf-microdata
3
+ version: !ruby/object:Gem::Version
4
+ prerelease:
5
+ version: 0.1.0
6
+ platform: ruby
7
+ authors:
8
+ - Gregg Kellogg
9
+ autorequire:
10
+ bindir: bin
11
+ cert_chain: []
12
+
13
+ date: 2011-06-29 00:00:00 -07:00
14
+ default_executable:
15
+ dependencies:
16
+ - !ruby/object:Gem::Dependency
17
+ name: rdf
18
+ prerelease: false
19
+ requirement: &id001 !ruby/object:Gem::Requirement
20
+ none: false
21
+ requirements:
22
+ - - ">="
23
+ - !ruby/object:Gem::Version
24
+ version: 0.3.3
25
+ type: :runtime
26
+ version_requirements: *id001
27
+ - !ruby/object:Gem::Dependency
28
+ name: nokogiri
29
+ prerelease: false
30
+ requirement: &id002 !ruby/object:Gem::Requirement
31
+ none: false
32
+ requirements:
33
+ - - ">="
34
+ - !ruby/object:Gem::Version
35
+ version: 1.4.4
36
+ type: :runtime
37
+ version_requirements: *id002
38
+ - !ruby/object:Gem::Dependency
39
+ name: yard
40
+ prerelease: false
41
+ requirement: &id003 !ruby/object:Gem::Requirement
42
+ none: false
43
+ requirements:
44
+ - - ">="
45
+ - !ruby/object:Gem::Version
46
+ version: 0.6.0
47
+ type: :development
48
+ version_requirements: *id003
49
+ - !ruby/object:Gem::Dependency
50
+ name: rspec
51
+ prerelease: false
52
+ requirement: &id004 !ruby/object:Gem::Requirement
53
+ none: false
54
+ requirements:
55
+ - - ">="
56
+ - !ruby/object:Gem::Version
57
+ version: 2.5.0
58
+ type: :development
59
+ version_requirements: *id004
60
+ - !ruby/object:Gem::Dependency
61
+ name: rdf-spec
62
+ prerelease: false
63
+ requirement: &id005 !ruby/object:Gem::Requirement
64
+ none: false
65
+ requirements:
66
+ - - ">="
67
+ - !ruby/object:Gem::Version
68
+ version: 0.3.2
69
+ type: :development
70
+ version_requirements: *id005
71
+ - !ruby/object:Gem::Dependency
72
+ name: rdf-n3
73
+ prerelease: false
74
+ requirement: &id006 !ruby/object:Gem::Requirement
75
+ none: false
76
+ requirements:
77
+ - - ">="
78
+ - !ruby/object:Gem::Version
79
+ version: 0.3.3
80
+ type: :development
81
+ version_requirements: *id006
82
+ - !ruby/object:Gem::Dependency
83
+ name: rdf-isomorphic
84
+ prerelease: false
85
+ requirement: &id007 !ruby/object:Gem::Requirement
86
+ none: false
87
+ requirements:
88
+ - - ">="
89
+ - !ruby/object:Gem::Version
90
+ version: 0.3.4
91
+ type: :development
92
+ version_requirements: *id007
93
+ description: Microdata reader for Ruby.
94
+ email: public-rdf-ruby@w3.org
95
+ executables: []
96
+
97
+ extensions: []
98
+
99
+ extra_rdoc_files: []
100
+
101
+ files:
102
+ - AUTHORS
103
+ - README
104
+ - UNLICENSE
105
+ - VERSION
106
+ - lib/rdf/microdata/extensions.rb
107
+ - lib/rdf/microdata/format.rb
108
+ - lib/rdf/microdata/reader.rb
109
+ - lib/rdf/microdata/version.rb
110
+ - lib/rdf/microdata/vocab.rb
111
+ - lib/rdf/microdata.rb
112
+ has_rdoc: false
113
+ homepage: http://github.com/gkellogg/rdf-microdata
114
+ licenses:
115
+ - Public Domain
116
+ post_install_message:
117
+ rdoc_options: []
118
+
119
+ require_paths:
120
+ - lib
121
+ required_ruby_version: !ruby/object:Gem::Requirement
122
+ none: false
123
+ requirements:
124
+ - - ">="
125
+ - !ruby/object:Gem::Version
126
+ version: 1.8.1
127
+ required_rubygems_version: !ruby/object:Gem::Requirement
128
+ none: false
129
+ requirements:
130
+ - - ">="
131
+ - !ruby/object:Gem::Version
132
+ version: "0"
133
+ requirements: []
134
+
135
+ rubyforge_project: rdf-microdata
136
+ rubygems_version: 1.6.2
137
+ signing_key:
138
+ specification_version: 3
139
+ summary: Microdata reader for Ruby.
140
+ test_files: []
141
+