RubyGems - rdf-microdata - Versions diffs - 0.2.2 → 0.2.3 - Mend

rdf-microdata 0.2.2 → 0.2.3

Files changed (11) hide show

data/README +21 -45
data/VERSION +1 -1
data/etc/doap.html +42 -0
data/etc/registry.json +39 -0
data/lib/rdf/microdata.rb +0 -2
data/lib/rdf/microdata/reader.rb +316 -193
data/lib/rdf/microdata/reader/nokogiri.rb +232 -0
data/lib/rdf/microdata/reader/rexml.rb +277 -0
data/lib/rdf/microdata/vocab.rb +1 -1
metadata +58 -21
data/lib/rdf/microdata/extensions.rb +0 -34

data/README CHANGED Viewed

@@ -6,13 +6,20 @@
 RDF::Microdata is a Microdata reader for Ruby using the [RDF.rb][RDF.rb] library suite.
 ## FEATURES
-RDF::Microdata parses [Microdata][] into statements or triples.
+RDF::Microdata parses [Microdata][] into statements or triples using the rules defined in [Microdata RDF][].
 * Microdata parser.
-* Uses Nokogiri for parsing HTML
+* If available, Uses Nokogiri for parsing HTML/SVG, falls back to REXML otherwise (and for JRuby)
 Install with 'gem install rdf-microdata'
+### Living implementation
+Microdata to RDF transformation is undergoing active development. This implementation attempts to be up-to-date
+as of the time of release, and is being used in developing the [Microdata RDF][] specification
+### Microdata Registry
+The parser uses a build-in version of the [Microdata RDF][] registry.
 ## Usage
 ### Reading RDF data in the Microdata format
@@ -20,49 +27,14 @@ Install with 'gem install rdf-microdata'
     graph = RDF::Graph.load("etc/foaf.html", :format => :microdata)
 ## Note
-The Microdata editor has recently [dropped support for RDF
-conversion](http://html5.org/tools/web-apps-tracker?from=6426&to=6427), as a result, this gem is being used to
-investigate ways in which Microdata might have more satisfactory RDF generation.
-### Generating RDF friendly URIs from terms
-If the `@itemprop` is included within an item having an `@itemtype`,
-the URI of the `@itemtype` will be used for generating a term URI. The type URI will be trimmed following
-the last '#' or '/' character, and the term will be appended to the resulting URI. This is in keeping
-with standard convention for defining properties and classes within an RDFS or OWL vocabulary.
-For example:
-    <div itemscope itemtype="http://schema.org/Person">
-      My name is <span itemprop="name">Gregg</span>
-    </div>
-Without the `:rdf\_terms` option, this would create the following statements:
-    @prefix md: <http://www.w3.org/1999/xhtml/microdata#> .
-    @prefix schema: <http://schema.org/> .
-    <> md:item [
-      a schema:Person;
-      <http://www.w3.org/1999/xhtml/microdata#http://schema.org/Person%23:name> "Gregg"
-    ] .
-With the `:rdf\_terms` option, this becomes:
-    @prefix md: <http://www.w3.org/1999/xhtml/microdata#> .
-    @prefix schema: <http://schema.org/> .
-    <> md:item [ a schema:Person; schema:name "Gregg" ] .
-### Improve xsd:date, xsd:time, xsd:dateTime and xsd:duration generation from _time_ element
-Use the lexical form of the @datetime attribute of the _time_ element to determine the specific type
-of the generated literal.
-### Remove implicit RDF triple generation
-html>head>title and anchor (_a_) elements no longer generate triples without @item* properties
+This spec is based on the W3C HTML Data Task Force specification and does not support
+GRDDL-type triple generation, such as for html>head>title and <a>
 ## Dependencies
 * [RDF.rb](http://rubygems.org/gems/rdf) (>= 0.3.4)
-* [Nokogiri](http://rubygems.org/gems/nokogiri) (>= 1.3.3)
+* [RDF::XSD](http://rubygems.org/gems/rdf-xsd) (>= 0.3.4)
+* [HTMLEntities](https://rubygems.org/gems/htmlentities) ('>= 4.3.0')
+* Soft dependency on [Nokogiri](http://rubygems.org/gems/nokogiri) (>= 1.5.0)
 ## Documentation
 Full documentation available on [Rubydoc.info][Microdata doc]
@@ -71,6 +43,8 @@ Full documentation available on [Rubydoc.info][Microdata doc]
 * {RDF::Microdata::Format}
   Asserts :html format, text/html mime-type and .html file extension.
 * {RDF::Microdata::Reader}
+  * {RDF::Microdata::Reader::Nokogiri}
+  * {RDF::Microdata::Reader::REXML}
 ### Additional vocabularies
@@ -81,8 +55,9 @@ Full documentation available on [Rubydoc.info][Microdata doc]
 ## Resources
 * [RDF.rb][RDF.rb]
 * [Documentation](http://rdf.rubyforge.org/microdata)
-* [History](file:file.History.html)
+* [History](file:History.md)
 * [Microdata][]
+* [Microdata RDF][]
 ## Author
 * [Gregg Kellogg](http://github.com/gkellogg) - <http://kellogg-assoc.com/>
@@ -117,5 +92,6 @@ see <http://unlicense.org/> or the accompanying {file:UNLICENSE} file.
 [YARD]:             http://yardoc.org/
 [YARD-GS]:          http://rubydoc.info/docs/yard/file/docs/GettingStarted.md
 [PDD]:              http://lists.w3.org/Archives/Public/public-rdf-ruby/2010May/0013.html
-[Microdata]:        http://www.w3.org/TR/2011/WD-microdata-20110525/     "HTML Microdata"
+[Microdata]:        http://dev.w3.org/html5/md/Overview.html                                      "HTML Microdata"
+[Microdata RDF]:    https://dvcs.w3.org/hg/htmldata/raw-file/default/microdata-rdf/index.html     "Microdata to RDF"
 [Microdata doc]:    http://rubydoc.info/github/gkellogg/rdf-microdata/frames

data/VERSION CHANGED Viewed

	@@ -1 +1 @@
1	- 0.2.2
1	+ 0.2.3

data/etc/doap.html ADDED Viewed

@@ -0,0 +1,42 @@
+<!DOCTYPE html>
+<html itemscope itemid="http://rubygems.org/gems/rdf-microdata" itemtype="http://usefulinc.com/ns/doap#Project">
+  <head>
+    <title lang="en" itemprop="shortdesc">Microdata reader for Ruby.</title>
+  </head>
+  <body about="" typeof="Project">
+    <p>Project description for <span itemprop="name">RDF::Microdata</span>.</p>
+    <p lang="en" itemprop="description">
+      RDF::Microdata is an Microdata reader for Ruby using the RDF.rb library suite.
+    </p>
+    <dl>
+      <dt>Creator</dt><dd>
+        <a itemprop="http://purl.org/dc/terms/creator developer documenter maintainer http://xmlns.com/foaf/0.1/creator" href="http://greggkellogg.net/foaf#me"
+        >Gregg Kellogg</a>
+      </dd>
+      <dt>Created</dt><time itemprop="created" datetime="2011-08-29"/></dd>
+      <dt>Blog</dt><dd><a href="http://greggkellogg.net/" itemprop="blog">http://greggkellogg.net/</a></dd>
+      <dt>Bug DB</dt><dd>
+        <a href="http://github.com/gkellogg/rdf-microdata/issues" itemprop="bug-database">
+          http://github.com/gkellogg/rdf-microdata/issues
+        </a>
+      </dd>
+      <dt>Category</dt><dd itemprop="category">
+        <a href="http://dbpedia.org/resource/Resource_Description_Framework">Resource Description Framework</a>
+        for
+        <a itemprop="programming-language" href="http://dbpedia.org/resource/Ruby_(programming_language)">Ruby</a>
+      </dd>
+      <dt>Download</dt><dd><a href="http://rubygems.org/gems/rdf-microdata" itemprop="download-page">
+        http://rubygems.org/gems/rdf-microdata
+      </a></dd>
+      <dt>Home Page</dt><dd><a href="http://github.com/gkellogg/rdf-microdata" itemprop="homepage">
+        http://github.com/gkellogg/rdf-microdata
+      </a></dd>
+      <dt>License</dt><dd>
+        <a href="http://creativecommons.org/licenses/publicdomain/" itemprop="license">Public Domain</a>
+      </dd>
+      <dt>Mailing List</dt><dd><a href="http://lists.w3.org/Archives/Public/public-rdf-ruby/" itemprop="mailing-list">
+        http://lists.w3.org/Archives/Public/public-rdf-ruby/
+      </a></dd>
+    </dl>
+  </body>
+</html>

data/etc/registry.json ADDED Viewed

@@ -0,0 +1,39 @@
+{
+  "http://schema.org/": {
+    "propertyURI":    "vocabulary",
+    "multipleValues": "unordered",
+    "properties": {
+      "blogPosts": {"multipleValues": "list"},
+      "breadcrumb": {"multipleValues": "list"},
+      "byArtist": {"multipleValues": "list"},
+      "creator": {"multipleValues": "list"},
+      "episodes": {"multipleValues": "list"},
+      "events": {"multipleValues": "list"},
+      "founders": {"multipleValues": "list"},
+      "itemListElement": {"multipleValues": "list"},
+      "musicGroupMember": {"multipleValues": "list"},
+      "performerIn": {"multipleValues": "list"},
+      "performers": {"multipleValues": "list"},
+      "producer": {"multipleValues": "list"},
+      "recipeInstructions": {"multipleValues": "list"},
+      "seasons": {"multipleValues": "list"},
+      "subEvents": {"multipleValues": "list"},
+      "tracks": {"multipleValues": "list"}
+    }
+  },
+  "http://microformats.org/profile/hcard": {
+    "propertyURI":    "vocabulary",
+    "multipleValues": "unordered"
+  },
+  "http://microformats.org/profile/hcalendar#": {
+    "propertyURI":    "vocabulary",
+    "multipleValues": "unordered",
+    "properties": {
+      "categories": {"multipleValues": "list"}
+    }
+  },
+  "http://n.whatwg.org/work": {
+    "propertyURI":    "contextual",
+    "multipleValues": "list"
+  }
+}

data/lib/rdf/microdata.rb CHANGED Viewed

@@ -30,5 +30,3 @@ module RDF
     def self.debug=(value); @debug = value; end
   end
 end
-require 'rdf/microdata/extensions'

data/lib/rdf/microdata/reader.rb CHANGED Viewed

@@ -1,24 +1,33 @@
-require 'nokogiri'  # FIXME: Implement using different modules as in RDF::TriX
+begin
+  raise LoadError, "not with java" if RUBY_PLATFORM == "java"
+  require 'nokogiri'
+rescue LoadError => e
+  :rexml
+end
+require 'rdf/xsd'
+require 'json'
 module RDF::Microdata
   ##
   # An Microdata parser in Ruby
   #
   # Based on processing rules, amended with the following:
-  # * property generation from tokens now uses the associated @itemtype as the basis for generation
-  # * implicit triples are not generated, only those with @item*
-  # * @datetime values are scanned lexically to find appropriate datatype
   #
-  # @see http://dev.w3.org/html5/md/
+  # @see https://dvcs.w3.org/hg/htmldata/raw-file/0d6b89f5befb/microdata-rdf/index.html
   # @author [Gregg Kellogg](http://kellogg-assoc.com/)
   class Reader < RDF::Reader
     format Format
-    XHTML = "http://www.w3.org/1999/xhtml"
     URL_PROPERTY_ELEMENTS = %w(a area audio embed iframe img link object source track video)
+    DEFAULT_REGISTRY = File.expand_path(File.join(File.dirname(__FILE__), "..", "..", "..", "etc", "registry.json"))
     class CrawlFailure < StandardError  #:nodoc:
     end
+    # Returns the HTML implementation module for this reader instance.
+    #
+    # @attr_reader [Module]
+    attr_reader :implementation
     ##
     # Returns the base URI determined by this reader.
     #
@@ -31,6 +40,124 @@ module RDF::Microdata
       @options[:base_uri]
     end
+    # Interface to registry
+    class Registry
+      ##
+      # Initialize the registry from a URI or file path
+      #
+      # @param [Hash] json
+      def self.load_registry(json)
+        @prefixes = {}
+        json.each do |prefix, elements|
+          propertyURI = elements.fetch("propertyURI", "vocabulary").to_sym
+          multipleValues = elements.fetch("multipleValues", "unordered").to_sym
+          properties = elements.fetch("properties", {})
+          @prefixes[prefix] = Registry.new(prefix, propertyURI, multipleValues, properties)
+        end
+      end
+      ##
+      # True if registry has already been loaded
+      def self.loaded?
+        @prefixes.is_a?(Hash)
+      end
+      ##
+      # Initialize registry for a particular prefix URI
+      #
+      # @param [RDF::URI] prefixURI
+      # @param [#to_sym] propertyURI (:vocabulary)
+      # @param [#to_sym] multipleValues (:unordered)
+      # @param [Hash] properties ({})
+      def initialize(prefixURI, propertyURI = :vocabulary, multipleValues = :unordered, properties = {})
+        @scheme = propertyURI.to_sym
+        @multipleValues = multipleValues.to_sym
+        @properties = properties
+        if @scheme == :vocabulary
+          @property_base = prefixURI.to_s
+          @property_base += '#' unless %w(/ #).include?(@property_base[-1]) # Append a '#' for fragment if necessary
+        else
+          @property_base = 'http://www.w3.org/ns/md?type='
+        end
+      end
+      ##
+      # Find a registry entry given a type URI
+      #
+      # @param [RDF::URI] type
+      # @return [Registry]
+      def self.find(type)
+        @prefixes.select do |key, value|
+          type.to_s.index(key) == 0
+        end.values.first
+      end
+      ##
+      # Generate a predicateURI given a `name`
+      #
+      # @param [#to_s] name
+      # @param [Hash{}] ec Evaluation Context
+      # @return [RDF::URI]
+      def predicateURI(name, ec)
+        u = RDF::URI(name)
+        return u if u.absolute?
+        n = frag_escape(name)
+        if ec[:current_type].nil?
+          u = RDF::URI(ec[:document_base].to_s)
+          u.fragment = frag_escape(name)
+          u
+        elsif @scheme == :vocabulary
+          # If scheme is vocabulary return the URI reference constructed by appending the fragment escaped value of name
+          # to current vocabulary, separated by a U+0023 NUMBER SIGN character (#) unless the current vocabulary ends
+          # with either a U+0023 NUMBER SIGN character (#) or SOLIDUS U+002F (/).
+          RDF::URI(@property_base + n)
+        else  # @scheme == :contextual
+          if ec[:current_type].to_s.index(@property_base) == 0
+            # return the concatenation of s, a U+002E FULL STOP character (.) and the fragment-escaped value of name.
+            RDF::URI(@property_base + '.' + n)
+          else
+            # return the concatenation of http://www.w3.org/ns/md?type=, the fragment-escaped value of s,
+            # the string &prop=, and the fragment-escaped value of name
+            RDF::URI(@property_base + frag_escape(ec[:current_type]) + '?prop=' + n)
+          end
+        end
+      end
+      ##
+      # Turn a predicateURI into a simple token
+      # @param [RDF::URI] predicateURI
+      # @return [String]
+      def tokenize(predicateURI)
+        case @scheme
+        when :vocabulary
+          predicateURI.to_s.sub(@property_base, '')
+        when :contextual
+          predicateURI.to_s.split('?prop=').last.split('.').last
+        end
+      end
+      ##
+      # Determine if property should be serialized as a list or not
+      # @param [RDF::URI] predicateURI
+      # @return [Boolean]
+      def as_list(predicateURI)
+        tok = tokenize(predicateURI)
+        if @properties[tok].is_a?(Hash)
+          @properties[tok]["multipleValues"].to_sym == :list
+        else
+          @multipleValues == :list
+        end
+      end
+      ##
+      # Fragment escape a name
+      def frag_escape(name)
+        name.to_s.gsub(/["#%<>\[\\\]^{|}]/) {|c| '%' + c.unpack('H2' * c.bytesize).join('%').upcase}
+      end
+    end
     ##
     # Initializes the Microdata reader instance.
     #
@@ -38,6 +165,8 @@ module RDF::Microdata
     #   the input stream to read
     # @param  [Hash{Symbol => Object}] options
     #   any additional options
+    # @option options [Symbol] :library (:nokogiri)
+    #   One of :nokogiri or :rexml. If nil/unspecified uses :nokogiri if available, :rexml otherwise.
     # @option options [Encoding] :encoding     (Encoding::UTF_8)
     #   the encoding of the input stream (Ruby 1.9+)
     # @option options [Boolean]  :validate     (false)
@@ -48,6 +177,7 @@ module RDF::Microdata
     #   whether to intern all parsed URIs
     # @option options [#to_s]    :base_uri     (nil)
     #   the base URI to use when resolving relative URIs
+    # @option options [#to_s]    :registry_uri (DEFAULT_REGISTRY)
     # @option options [Array] :debug
     #   Array to place debug messages
     # @return [reader]
@@ -59,24 +189,43 @@ module RDF::Microdata
       super do
         @debug = options[:debug]
-        @doc = case input
-        when Nokogiri::HTML::Document, Nokogiri::XML::Document
-          input
-        else
-          # Try to detect charset from input
-          options[:encoding] ||= input.charset if input.respond_to?(:charset)
-          # Otherwise, default is utf-8
-          options[:encoding] ||= 'utf-8'
+        @library = case options[:library]
+          when nil
+            (defined?(::Nokogiri) && RUBY_PLATFORM != 'java') ? :nokogiri : :rexml
+          when :nokogiri, :rexml
+            options[:library]
+          else
+            raise ArgumentError.new("expected :rexml or :nokogiri, but got #{options[:library].inspect}")
+        end
-          add_debug(nil, "base_uri: #{base_uri}")
-          Nokogiri::HTML.parse(input, base_uri.to_s, options[:encoding])
+        require "rdf/microdata/reader/#{@library}"
+        @implementation = case @library
+          when :nokogiri then Nokogiri
+          when :rexml    then REXML
         end
-        errors = @doc.errors.reject {|e| e.to_s =~ /Tag (audio|source|track|video|time) invalid/}
+        self.extend(@implementation)
+        initialize_html(input, options) rescue raise RDF::ReaderError.new($!.message)
+        if (root.nil? && validate?)
+          raise RDF::ReaderError, "Empty Document"
+        end
+        errors = doc_errors.reject {|e| e.to_s =~ /Tag (audio|source|track|video|time) invalid/}
         raise RDF::ReaderError, "Syntax errors:\n#{errors}" if !errors.empty? && validate?
-        raise RDF::ReaderError, "Empty document" if (@doc.nil? || @doc.root.nil?) && validate?
+        add_debug(@doc, "library = #{@library}")
+        # Load registry
+        unless Registry.loaded?
+          registry = options[:registry_uri] || DEFAULT_REGISTRY
+          begin
+            json = RDF::Util::File.open_file(registry) { |f| JSON.load(f) }
+          rescue JSON::ParserError => e
+            raise RDF::ReaderError, "Failed to parse registry: #{e.message}"
+          end
+          Registry.load_registry(json)
+        end
         if block_given?
           case block.arity
             when 0 then instance_eval(&block)
@@ -121,19 +270,19 @@ module RDF::Microdata
       @bnode_cache[value.to_s] ||= RDF::Node.new(value)
     end
-    # Figure out the document path, if it is a Nokogiri::XML::Element or Attribute
+    # Figure out the document path, if it is an Element or Attribute
     def node_path(node)
-      "<#{base_uri}>" + case node
-      when Nokogiri::XML::Node then node.display_path
-      else node.to_s
-      end
+      "<#{base_uri}>#{node.respond_to?(:display_path) ? node.display_path : node}"
     end
     # Add debug event to debug array, if specified
     #
-    # @param [XML Node, any] node:: XML Node or string for showing context
+    # @param [Nokogiri::XML::Node, #to_s] node:: XML Node or string for showing context
     # @param [String] message::
-    def add_debug(node, message)
+    # @yieldreturn [String] appended to message, to allow for lazy-evaulation of message
+    def add_debug(node, message = "")
+      return unless ::RDF::Microdata.debug? || @debug
+      message = message + yield if block_given?
       puts "#{node_path(node)}: #{message}" if ::RDF::Microdata::debug?
       @debug << "#{node_path(node)}: #{message}" if @debug.is_a?(Array)
     end
@@ -153,107 +302,50 @@ module RDF::Microdata
     # @raise [ReaderError]:: Checks parameter types and raises if they are incorrect if parsing mode is _validate_.
     def add_triple(node, subject, predicate, object)
       statement = RDF::Statement.new(subject, predicate, object)
-      add_debug(node, "statement: #{RDF::NTriples.serialize(statement)}")
+      add_debug(node) {"statement: #{RDF::NTriples.serialize(statement)}"}
       @callback.call(statement)
     end
     # Parsing a Microdata document (this is *not* the recursive method)
     def parse_whole_document(doc, base)
-      base_el = doc.at_css('html>head>base')
-      base = base_el.attribute('href').to_s.split('#').first if base_el
-      add_debug(doc, "parse_whole_doc: options=#{@options.inspect}")
-      if (base)
+      base = doc_base(base)
+      options[:base_uri] = if (base)
         # Strip any fragment from base
         base = base.to_s.split('#').first
-        base = options[:base_uri] = uri(base)
-        add_debug(base_el, "parse_whole_doc: base='#{base}'")
+        base = uri(base)
       else
         base = RDF::URI("")
       end
-      # 2. For each a, area, and link element in the Document, run these substeps:
+      add_debug(nil) {"parse_whole_doc: base='#{base}'"}
+      ec = {
+        :memory             => {},
+        :current_name       => nil,
+        :current_type       => nil,
+        :current_vocabulary => nil,
+        :document_base      => base,
+      }
+      items = []
+      # 1) For each element that is also a top-level item run the following algorithm:
       #
-      # * If the element does not have a rel attribute, then skip this element.
-      # * If the element does not have an href attribute, then skip this element.
-      # * If resolving the element's href attribute relative to the element is not successful,
-      #   then skip this element.
-      doc.css('a, area, link').each do |el|
-        rel, href = el.attribute('rel'), el.attribute('href')
-        next unless rel && href
-        href = uri(href, el.base || base)
-        add_debug(el, "a: rel=#{rel.inspect}, href=#{href}")
-        # Otherwise, split the value of the element's rel attribute on spaces, obtaining list of tokens.
-        # Coalesce duplicate tokens in list of tokens.
-        tokens = rel.to_s.split(/\s+/).map do |tok|
-          # Convert each token in list of tokens that does not contain a U+003A COLON characters (:)
-          # to ASCII lowercase.
-          tok =~ /:/ ? tok : tok.downcase
-        end.uniq
-        # If list of tokens contains both the tokens alternate and stylesheet,
-        # then remove them both and replace them with the single (uppercase) token
-        # ALTERNATE-STYLESHEET.
-        if tokens.include?('alternate') && tokens.include?('stylesheet')
-          tokens = tokens - %w(alternate stylesheet)
-          tokens << 'ALTERNATE-STYLESHEET'
-        end
-        tokens.each do |tok|
-          tok_uri = RDF::URI(tok)
-          if tok !~ /:/
-            # For each token token in list of tokens that contains no U+003A COLON characters (:),
-            # generate the following triple:
-            add_triple(el, base, RDF::XHV[tok.gsub('#', '%23')], href)
-          elsif tok_uri.absolute?
-            # For each token token in list of tokens that is an absolute URL, generate the following triple:
-            add_triple(el, base, tok_uri, href)
-          end
-        end
-      end
-      # 3. For each meta element in the Document that has a name attribute and a content attribute,
-      doc.css('meta[name][content]').each do |el|
-        name, content = el.attribute('name'), el.attribute('content')
-        name = name.to_s
-        name_uri = uri(name, el.base || base)
-        add_debug(el, "meta: name=#{name.inspect}")
-        if name !~ /:/
-          # If the value of the name attribute contains no U+003A COLON characters (:),
-          # generate the following triple:
-          add_triple(el, base, RDF::XHV[name.downcase.gsub('#', '%23')], RDF::Literal(content, :language => el.language))
-        elsif name_uri.absolute?
-          # If the value of the name attribute contains no U+003A COLON characters (:),
-          # generate the following triple:
-          add_triple(el, base, name_uri, RDF::Literal(content, :language => el.language))
-        end
-      end
-      # 4. For each blockquote and q element in the Document that has a cite attribute that resolves
-      #    successfully relative to the element, generate the following triple:
-      doc.css('blockquote[cite], q[cite]').each do |el|
-        object = uri(el.attribute('cite'), el.base || base)
-        add_debug(el, "blockquote: cite=#{object}")
-        add_triple(el, base, RDF::DC.source, object)
+      #   1) Generate the triples for an item item, using the evaluation context.
+      #      Let result be the (URI reference or blank node) subject returned.
+      #   2) Append result to item list.
+      getItems.each do |el|
+        result = generate_triples(el, ec)
+        items << result
       end
+      # 2) Generate an RDF Collection list from
+      #    the ordered list of values. Set value to the value returned from generate an RDF Collection.
+      value = generateRDFCollection(root, items)
-      # 5. Let memory be a mapping of items to subjects, initially empty.
-      # 6. For each element that is also a top-level microdata item, run the following steps:
-      #    * Generate the triples for the item. Pass a reference to memory as the item/subject list.
-      #      Let result be the subject returned.
-      #    * Generate the following triple:
-      #      subject    the document's current address
-      #      predicate  http://www.w3.org/1999/xhtml/microdata#item
-      #      object     result
-      memory = {}
-      doc.css('[itemscope]').
-        select {|el| !el.has_attribute?('itemprop')}.
-        each do |el|
-          object = generate_triples(el, memory)
-          add_triple(el, base, RDF::MD.item, object)
-      end
+      # 3) Generate the following triple:
+      #     subject Document base
+      #     predicate http://www.w3.org/1999/xhtml/microdata#item
+      #     object value
+      add_triple(doc, base, RDF::MD.item, value) if value
       add_debug(doc, "parse_whole_doc: traversal complete")
     end
@@ -261,94 +353,119 @@ module RDF::Microdata
     ##
     # Generate triples for an item
     # @param [RDF::Resource] item
-    # @param [Hash{Nokogiri::XML::Element} => RDF::Resource] memory
-    # @param [Hash{Symbol => Object}] options
-    # @option options [RDF::Resource] :fallback_type
-    # @option options [RDF::Resource] :fallback_name
+    # @param [Hash{Symbol => Object}] ec
+    # @option ec [Hash{Nokogiri::XML::Element} => RDF::Resource] memory
+    # @option ec [RDF::Resource] :current_type
     # @return [RDF::Resource]
-    def generate_triples(item, memory, options = {})
-      fallback_type = options[:fallback_type]
-      fallback_name = options[:fallback_name]
-      # 1. If there is an entry for item in memory, then let subject be the subject of that entry.
+    def generate_triples(item, ec = {})
+      memory = ec[:memory]
+      # 1) If there is an entry for item in memory, then let subject be the subject of that entry.
       #    Otherwise, if item has a global identifier and that global identifier is an absolute URL,
       #    let subject be that global identifier. Otherwise, let subject be a new blank node.
-      subject = if memory.include?(item)
-        memory[item][:subject]
+      subject = if memory.include?(item.node)
+        memory[item.node][:subject]
       elsif item.has_attribute?('itemid')
-        u = uri(item.attribute('itemid'), item.base || base_uri)
+        uri(item.attribute('itemid'), item.base || base_uri)
       end || RDF::Node.new
-      memory[item] ||= {}
+      memory[item.node] ||= {}
-      add_debug(item, "gentrips(2): subject=#{subject.inspect}")
+      add_debug(item) {"gentrips(2): subject=#{subject.inspect}, current_type: #{ec[:current_type]}"}
-      # 2. Add a mapping from item to subject in memory, if there isn't one already.
-      memory[item][:subject] ||= subject
+      # 2) Add a mapping from item to subject in memory, if there isn't one already.
+      memory[item.node][:subject] ||= subject
-      # 3. If item has an item type and that item type is an absolute URL, let type be that item type.
-      #    Otherwise, let type be the empty string.
-      rdf_type = type = uri(item.attribute('itemtype'))
-      type = '' unless type.absolute?
+      # 3) For each type returned from element.itemType of the element defining the item.
+      type = nil
+      item.attribute('itemtype').to_s.split(' ').map{|n| uri(n)}.select(&:absolute?).each do |t|
+        #   3.1. If type is an absolute URL, generate the following triple:
+        type ||= t
+        add_triple(item, subject, RDF.type, t)
+      end
-      if type != ''
-        add_triple(item, subject, RDF.type, type)
-        # 4.2. If type does not contain a U+0023 NUMBER SIGN character (#), then append a # to type.
-        type += '#' unless type.to_s.include?('#')
-        # 4.3. If type does not have a : after its #, append a : to type.
-        type += ':' unless type.to_s.match(/\#:/)
-      elsif fallback_type
-        add_debug(item, "gentrips(5.2): fallback_type=#{fallback_type}, fallback_name=#{fallback_name}")
-        rdf_type = type = fallback_type
-        # 5.2. If type does not contain a U+0023 NUMBER SIGN character (#), then append a # to type.
-        type += '#' unless type.to_s.include?('#')
-        # 5.3. If type does not have a : after its #, append a : to type.
-        type += ':' unless type.to_s.match(/\#:/)
-        # 5.4. If the last character of type is not a :, %20 to type.
-        type += '%20' unless type.to_s[-1,1] == ':'
-        # 5.5. Append the fragment-escaped value of fallback name to type.
-        type += fallback_name.to_s.gsub('#', '%23')
+      # 5) If type is not an absolute URL, set it to current type from the Evaluation Context if not empty.
+      type ||= ec[:current_type]
+      add_debug(item)  {"gentrips(5): type=#{type.inspect}"}
+      # 6) If the registry contains a URI prefix that is a character for character match of type up to the length of the
+      #    URI prefix, set vocab as that URI prefix
+      vocab = Registry.find(type)
+      # 7) Otherwise, if type is not empty, construct vocab by removing everything following the last
+      #    SOLIDUS U+002F ("/") or NUMBER SIGN U+0023 ("#") from type.
+      vocab ||= begin
+        type_vocab = type.to_s.sub(/([\/\#])[^\/\#]*$/, '\1')
+        add_debug(item)  {"gentrips(7): typtype_vocab=#{type_vocab.inspect}"}
+        Registry.new(type_vocab) # if type
       end
-      add_debug(item, "gentrips(6): type=#{type.inspect}")
-      # 6. For each element _element_ that has one or more property names and is one of the
+      # 8) Update evaluation context setting current vocabulary to vocab.
+      ec[:current_vocabulary] = vocab
+      # 9) Set property list to an empty mapping between properties and one or more ordered values as established below.
+      property_list = {}
+      # 10. For each element _element_ that has one or more property names and is one of the
       #    properties of the item _item_, in the order those elements are given by the algorithm
       #    that returns the properties of an item, run the following substep:
       props = item_properties(item)
-      # 6.1. For each name name in element's property names, run the following substeps:
+      # 10.1. For each name name in element's property names, run the following substeps:
       props.each do |element|
-        element.attribute('itemprop').to_s.split(' ').each do |name|
-          add_debug(element, "gentrips(6.1): name=#{name.inspect}")
-          # If type is the empty string and name is not an absolute URL, then abort these substeps.
-          name_uri = RDF::URI(name)
-          next if type == '' && !name_uri.absolute?
+        element.attribute('itemprop').to_s.split(' ').compact.each do |name|
+          add_debug(element) {"gentrips(10.1): name=#{name.inspect}, type=#{type}"}
+          # Let context be a copy of evaluation context with current type set to type and current vocabulary set to vocab.
+          ec_new = ec.merge({:current_type => type, :current_vocabulary => vocab})
+          predicate = vocab.predicateURI(name, ec_new)
+          ec_new[:current_name] = predicate
+          add_debug(element) {"gentrips(10.1.2): predicate=#{predicate}"}
+          # 10.1.3) Let value be the property value of element.
           value = property_value(element)
-          add_debug(element, "gentrips(6.1.2) value=#{value.inspect}")
+          add_debug(element) {"gentrips(10.1.3) value=#{value.inspect}"}
+          # 10.1.4) If value is an item, then generate the triples for value using a copy of evaluation context with
+          #       current type set to type. Replace value by the subject returned from those steps.
           if value.is_a?(Hash)
-            value = generate_triples(element, memory, :fallback_type => type, :fallback_name => name)
+            value = generate_triples(element, ec_new)
+            add_debug(element) {"gentrips(10.1.4): value=#{value.inspect}"}
           end
-          add_debug(element, "gentrips(6.1.3): value=#{value.inspect}")
-          predicate = if name_uri.absolute?
-            name_uri
-          else
-            # Use the URI of the type to create URIs for @itemprop terms
-            add_debug(element, "gentrips: rdf_type=#{rdf_type}")
-            predicate = RDF::URI(rdf_type.to_s.sub(/([\/\#])[^\/\#]*$/, '\1' + name))
-          end
-          add_debug(element, "gentrips(6.1.5): predicate=#{predicate}")
-          add_triple(element, subject, predicate, value) if predicate
+          property_list[predicate] ||= []
+          property_list[predicate] << value
         end
       end
+      # 11) For each predicate in property list
+      property_list.each do |predicate, values|
+        generatePropertyValues(item, subject, predicate, values, ec)
+      end
       subject
     end
+    def generatePropertyValues(element, subject, predicate, values, ec)
+      registry = ec[:current_vocabulary]
+      if registry.as_list(predicate)
+        value = generateRDFCollection(element, values)
+        add_triple(element, subject, predicate, value)
+      else
+        values.each {|v| add_triple(element, subject, predicate, v)}
+      end
+    end
+    ##
+    # Called when values has more than one entry
+    # @param [Nokogiri::HTML::Element] element
+    # @param [Array<RDF::Value>] values
+    # @return [RDF::Node]
+    def generateRDFCollection(element, values)
+      list = RDF::List.new(nil, nil, values)
+      list.each_statement do |st|
+        add_triple(element, st.subject, st.predicate, st.object) unless st.object == RDF.List
+      end
+      list.subject
+    end
     ##
     # To find the properties of an item defined by the element root, the user agent must try
     # to crawl the properties of the element root, with an empty list as the value of memory:
@@ -378,13 +495,14 @@ module RDF::Microdata
     # @return [Array<Array<Nokogiri::XML::Element>, Integer>]
     #   Resultant elements and error count
     def crawl_properties(root, memory)
       # 1. If root is in memory, then the algorithm fails; abort these steps.
       raise CrawlFailure, "crawl_props mem already has #{root.inspect}" if memory.include?(root)
       # 2. Collect all the elements in the item root; let results be the resulting
       #    list of elements, and errors be the resulting count of errors.
       results, errors = elements_in_item(root)
-      add_debug(root, "crawl_properties results=#{results.inspect}, errors=#{errors}")
+      add_debug(root) {"crawl_properties results=#{results.map {|e| node_path(e)}.inspect}, errors=#{errors}"}
       # 3. Remove any elements from results that do not have an itemprop attribute specified.
       results = results.select {|e| e.has_attribute?('itemprop')}
@@ -427,13 +545,13 @@ module RDF::Microdata
       # If root has an itemref attribute, split the value of that itemref attribute on spaces.
       # For each resulting token ID,
       root.attribute('itemref').to_s.split(' ').each do |id|
-        add_debug(root, "elements_in_item itemref id #{id}")
+        add_debug(root) {"elements_in_item itemref id #{id}"}
         # if there is an element in the home subtree of root with the ID ID,
         # then add the first such element to pending.
-        id_elem = @doc.at_css("##{id}")
+        id_elem = find_element_by_id(id)
         pending << id_elem if id_elem
       end
-      add_debug(root, "elements_in_item pending #{pending.inspect}")
+      add_debug(root) {"elements_in_item pending #{pending.inspect}"}
       # Loop: Remove an element from pending and let current be that element.
       while current = pending.shift
@@ -457,37 +575,42 @@ module RDF::Microdata
     ##
     #
     def property_value(element)
-      add_debug(element, "property_value(#{element.inspect}): base #{element.base.inspect}, base_uri: #{base_uri.inspect}")
-      case
+      base = element.base || base_uri
+      add_debug(element) {"property_value(#{element.name}): base #{base.inspect}"}
+      value = case
       when element.has_attribute?('itemscope')
         {}
       when element.name == 'meta'
-        element.attribute('content').to_s
+        RDF::Literal.new(element.attribute('content').to_s, :language => element.language)
+      when element.name == 'data'
+        RDF::Literal.new(element.attribute('value').to_s, :language => element.language)
       when %w(audio embed iframe img source track video).include?(element.name)
-        uri(element.attribute('src'), element.base || base_uri)
+        uri(element.attribute('src'), base)
       when %w(a area link).include?(element.name)
-        uri(element.attribute('href'), element.base || base_uri)
+        uri(element.attribute('href'), base)
       when %w(object).include?(element.name)
-        uri(element.attribute('data'), element.base || base_uri)
-      when %w(time).include?(element.name) && element.has_attribute?('datetime')
+        uri(element.attribute('data'), base)
+      when %w(time).include?(element.name)
         # Lexically scan value and assign appropriate type, otherwise, leave untyped
-        v = element.attribute('datetime').to_s
-        datatype = %w(Date Time DateTime).map {|t| RDF::Literal.const_get(t)}.detect do |dt|
+        v = (element.attribute('datetime') || element.text).to_s
+        datatype = %w(Date Time DateTime Duration).map {|t| RDF::Literal.const_get(t)}.detect do |dt|
           v.match(dt::GRAMMAR)
         end || RDF::Literal
-        datatype.new(v)
+        datatype.new(v, :language => element.language)
       else
-        RDF::Literal.new(element.text, :language => element.language)
+        RDF::Literal.new(element.inner_text, :language => element.language)
       end
+      add_debug(element) {"  #{value.inspect}"}
+      value
     end
     # Fixme, what about xml:base relative to element?
     def uri(value, base = nil)
       value = if base
         base = uri(base) unless base.is_a?(RDF::URI)
-        base.join(value)
+        base.join(value.to_s)
       else
-        RDF::URI(value)
+        RDF::URI(value.to_s)
       end
       value.validate! if validate?
       value.canonicalize! if canonicalize?