RubyGems - rdf-normalize - Versions diffs - 0.5.0 → 0.6.0 - Mend

rdf-normalize 0.5.0 → 0.6.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (10) hide show

checksums.yaml +4 -4
data/README.md +19 -10
data/VERSION +1 -1
data/lib/rdf/normalize/rdfc10.rb +390 -0
data/lib/rdf/normalize/urgna2012.rb +5 -7
data/lib/rdf/normalize/version.rb +1 -1
data/lib/rdf/normalize/writer.rb +2 -2
data/lib/rdf/normalize.rb +7 -6
metadata +15 -10
data/lib/rdf/normalize/urdna2015.rb +0 -263

checksums.yaml CHANGED Viewed

@@ -1,7 +1,7 @@
 ---
 SHA256:
-  metadata.gz: 60932e4daabb349e76fb838c46aeea30ea39ff2406d7fd2f08b4df20b22975e4
-  data.tar.gz: d1f1b87ac1a46ad18c7a1e848c8e7306b20cedbdb374cc1a197f8a2235875b27
+  metadata.gz: 15478756de443574bde6120436faf09bec1f7e40dcfc60f39fc97af92e686738
+  data.tar.gz: d5617da52a4d7e3429452f691e4a9ccb7f6ac8bedcef6dd66583b1322e0b57f0
 SHA512:
-  metadata.gz: 263602b7c6861bf74745600e5828f5095cb9a3a3f08974621d050982a78bc6d4d6e1c6e421cc69e5bc9fa0146cfd0521b0f4817989b8644b4d25fa02c13015dd
-  data.tar.gz: 98a1391dd232a2db5ea8353a51268fcfe9def16dc7a8f309bb305c7feeb9df85f8e96c8cc5dbcf4b3486e34d0133b33bdb353d2aa567bb948d412bb36be5e202
+  metadata.gz: 7c2ccd4449f12d5095702d19a8c1d27539aa5afa23c8b96ffcf6f43ee0d6d10fd763e2dbc98f2ef008ede3edc3fda1801eb6a1cd3ad0e80e3b82995017ae93e4
+  data.tar.gz: f760c7336703292679c82b6abbea86ffe7b8ac1b803508c187d8aee7bcd8cd635d0b039d928b7d145198f7df884027aeb911fa2e97e0e9d171cae92e4d26ed0b

data/README.md CHANGED Viewed

@@ -1,13 +1,13 @@
 # RDF::Normalize
 RDF Graph normalizer for [RDF.rb][RDF.rb].
-[![Gem Version](https://badge.fury.io/rb/rdf-normalize.png)](https://badge.fury.io/rb/rdf-normalize)
+[![Gem Version](https://badge.fury.io/rb/rdf-normalize.svg)](https://badge.fury.io/rb/rdf-normalize)
 [![Build Status](https://github.com/ruby-rdf/rdf-normalize/workflows/CI/badge.svg?branch=develop)](https://github.com/ruby-rdf/rdf-normalize/actions?query=workflow%3ACI)
 [![Coverage Status](https://coveralls.io/repos/ruby-rdf/rdf-normalize/badge.svg?branch=develop)](https://coveralls.io/github/ruby-rdf/rdf-normalize?branch=develop)
 [![Gitter chat](https://badges.gitter.im/ruby-rdf/rdf.png)](https://gitter.im/ruby-rdf/rdf)
 ## Description
-This is a [Ruby][] implementation of a [RDF Normalize][] for [RDF.rb][].
+This is a [Ruby][] implementation of a [RDF Dataset Canonicalization][] for [RDF.rb][].
 ## Features
 RDF::Normalize generates normalized [N-Quads][] output for an RDF Dataset using the algorithm
@@ -16,8 +16,8 @@ to serialize normalized statements.
 Algorithms implemented:
-* [URGNA2012](https://json-ld.github.io/normalization/spec/index.html#dfn-urgna2012)
-* [URDNA2015](https://json-ld.github.io/normalization/spec/index.html#dfn-urdna2015)
+* [URGNA2012](https://www.w3.org/TR/rdf-canon/#dfn-urgna2012)
+* [RDFC-1.0](https://www.w3.org/TR/rdf-canon/#dfn-rdfc-1-0)
 Install with `gem install rdf-normalize`
@@ -27,7 +27,17 @@ Install with `gem install rdf-normalize`
 ## Usage
 ## Documentation
-Full documentation available on [Rubydoc.info][Normalize doc]
+Full documentation available on [GitHub][Normalize doc]
+## Examples
+### Returning normalized N-Quads
+    require 'rdf/normalize'
+    require 'rdf/turtle'
+    g = RDF::Graph.load("etc/doap.ttl")
+    puts g.dump(:normalize)
 ### Principle Classes
 * {RDF::Normalize}
@@ -35,8 +45,7 @@ Full documentation available on [Rubydoc.info][Normalize doc]
   * {RDF::Normalize::Format}
   * {RDF::Normalize::Writer}
   * {RDF::Normalize::URGNA2012}
-  * {RDF::Normalize::URDNA2015}
+  * {RDF::Normalize::RDFC10}
 ## Dependencies
@@ -80,7 +89,7 @@ see <https://unlicense.org/> or the accompanying {file:LICENSE} file.
 [YARD]:         https://yardoc.org/
 [YARD-GS]:      https://rubydoc.info/docs/yard/file/docs/GettingStarted.md
 [PDD]:              https://unlicense.org/#unlicensing-contributions
-[RDF.rb]:       https://rubydoc.info/github/ruby-rdf/rdf-normalize
+[RDF.rb]:       https://ruby-rdf.github.io/rdf-normalize
 [N-Triples]:    https://www.w3.org/TR/rdf-testcases/#ntriples
-[RDF Normalize]:https://json-ld.github.io/normalization/spec/
-[Normalize doc]:https://rubydoc.info/github/ruby-rdf/rdf-normalize/master
+[RDF Dataset Canonicalization]: https://www.w3.org/TR/rdf-canon/
+[Normalize doc]: https://ruby-rdf.github.io/rdf-normalize/

data/VERSION CHANGED Viewed

	@@ -1 +1 @@
1	- 0.5.0
1	+ 0.6.0

data/lib/rdf/normalize/rdfc10.rb ADDED Viewed

@@ -0,0 +1,390 @@
+require 'rdf/nquads'
+begin
+  require 'json'
+rescue LoadError
+  # Used for debug output
+end
+module RDF::Normalize
+  class RDFC10
+    include RDF::Enumerable
+    include RDF::Util::Logger
+    include Base
+    ##
+    # Create an enumerable with grounded nodes
+    #
+    # @param [RDF::Enumerable] enumerable
+    # @return [RDF::Enumerable]
+    def initialize(enumerable, **options)
+      @dataset, @options = enumerable, options
+    end
+    def each(&block)
+      ns = NormalizationState.new(@options)
+      log_debug("ca:")
+      log_debug("  log point", "Entering the canonicalization function (4.5.3).")
+      log_depth(depth: 2) {normalize_statements(ns, &block)}
+    end
+    protected
+    def normalize_statements(ns, &block)
+      # Step 2: Map BNodes to the statements they are used by
+      dataset.each_statement do |statement|
+        statement.to_quad.compact.select(&:node?).each do |node|
+          ns.add_statement(node, statement)
+        end
+      end
+      log_debug("ca.2:")
+      log_debug("  log point", "Extract quads for each bnode (4.5.3 (2)).")
+      log_debug("  Bnode to quads:")
+      if logger && logger.level == 0
+        ns.bnode_to_statements.each do |bn, statements|
+          log_debug("    #{bn.id}:")
+          statements.each do |s|
+            log_debug {"      - #{s.to_nquads.strip}"}
+          end
+        end
+      end
+      ns.hash_to_bnodes = {}
+      # Step 3: Calculate hashes for first degree nodes
+      log_debug("ca.3:")
+      log_debug("  log point", "Calculated first degree hashes (4.5.3 (3)).")
+      log_debug("  with:")
+      ns.bnode_to_statements.each_key do |node|
+        log_debug("    - identifier") {node.id}
+        log_debug("      h1dq:")
+        hash = log_depth(depth: 8) {ns.hash_first_degree_quads(node)}
+        ns.add_bnode_hash(node, hash)
+      end
+      # Step 4: Create canonical replacements for hashes mapping to a single node
+      log_debug("ca.4:")
+      log_debug("  log point", "Create canonical replacements for hashes mapping to a single node (4.5.3 (4)).")
+      log_debug("  with:") unless ns.hash_to_bnodes.empty?
+      ns.hash_to_bnodes.keys.sort.each do |hash|
+        identifier_list = ns.hash_to_bnodes[hash]
+        next if identifier_list.length > 1
+        node = identifier_list.first
+        id = ns.canonical_issuer.issue_identifier(node)
+        log_debug("    - identifier") {node.id}
+        log_debug("      hash", hash)
+        log_debug("      canonical label", id)
+        ns.hash_to_bnodes.delete(hash)
+      end
+      # Step 5: Iterate over hashs having more than one node
+      log_debug("ca.5:") unless ns.hash_to_bnodes.empty?
+      log_debug("  log point", "Calculate hashes for identifiers with shared hashes (4.5.3 (5)).")
+      log_debug("  with:") unless ns.hash_to_bnodes.empty?
+      ns.hash_to_bnodes.keys.sort.each do |hash|
+        identifier_list = ns.hash_to_bnodes[hash]
+        log_debug("    - hash", hash)
+        log_debug("      identifier list") {identifier_list.map(&:id).to_json(indent: ' ')}
+        hash_path_list = []
+        # Create a hash_path_list for all bnodes using a temporary identifier used to create canonical replacements
+        log_debug("      ca.5.2:")
+        log_debug("        log point", "Calculate hashes for identifiers with shared hashes (4.5.3 (5.2)).")
+        log_debug("        with:") unless identifier_list.empty?
+        identifier_list.each do |identifier|
+          next if ns.canonical_issuer.issued.include?(identifier)
+          temporary_issuer = IdentifierIssuer.new("b")
+          temporary_issuer.issue_identifier(identifier)
+          log_debug("          - identifier") {identifier.id}
+          hash_path_list << log_depth(depth: 12) {ns.hash_n_degree_quads(identifier, temporary_issuer)}
+        end
+        # Create canonical replacements for nodes
+        log_debug("      ca.5.3:") unless hash_path_list.empty?
+        log_debug("        log point", "Canonical identifiers for temporary identifiers (4.5.3 (5.3)).")
+        log_debug("        issuer:") unless hash_path_list.empty?
+        hash_path_list.sort_by(&:first).each do |result, issuer|
+          issuer.issued.each do |node|
+            id = ns.canonical_issuer.issue_identifier(node)
+            log_debug("            - blank node") {node.id}
+            log_debug("              canonical identifier", id)
+          end
+        end
+      end
+      # Step 6: Yield statements using BNodes from canonical replacements
+      dataset.each_statement do |statement|
+        if statement.has_blank_nodes?
+          quad = statement.to_quad.compact.map do |term|
+            term.node? ? RDF::Node.intern(ns.canonical_issuer.identifier(term)) : term
+          end
+          block.call RDF::Statement.from(quad)
+        else
+          block.call statement
+        end
+      end
+      log_debug("ca.6:")
+      log_debug("  log point", "Replace original with canonical labels (4.5.3 (6)).")
+      log_debug("  canonical issuer: #{ns.canonical_issuer.inspect}")
+      dataset
+    end
+  private
+    class NormalizationState
+      include RDF::Util::Logger
+      attr_accessor :bnode_to_statements
+      attr_accessor :hash_to_bnodes
+      attr_accessor :canonical_issuer
+      def initialize(options)
+        @options = options
+        @bnode_to_statements, @hash_to_bnodes, @canonical_issuer = {}, {}, IdentifierIssuer.new("c14n")
+      end
+      def add_statement(node, statement)
+        bnode_to_statements[node] ||= []
+        bnode_to_statements[node] << statement unless bnode_to_statements[node].any? {|st| st.eql?(statement)}
+      end
+      def add_bnode_hash(node, hash)
+        hash_to_bnodes[hash] ||= []
+        # Match on object IDs of nodes, rather than simple node equality
+        hash_to_bnodes[hash] << node unless hash_to_bnodes[hash].any? {|n| n.eql?(node)}
+      end
+      # This algorithm calculates a hash for a given blank node across the quads in a dataset in which that blank node is a component. If the hash uniquely identifies that blank node, no further examination is necessary. Otherwise, a hash will be created for the blank node using the algorithm in [4.9 Hash N-Degree Quads](https://w3c.github.io/rdf-canon/spec/#hash-nd-quads) invoked via [4.5 Canonicalization Algorithm](https://w3c.github.io/rdf-canon/spec/#canon-algorithm).
+      #
+      # @param [RDF::Node] node The reference blank node identifier
+      # @return [String] the SHA256 hexdigest hash of statements using this node, with replacements
+      def hash_first_degree_quads(node)
+        nquads = bnode_to_statements[node].
+          map do |statement|
+            quad = statement.to_quad.map do |t|
+              case t
+              when node then RDF::Node("a")
+              when RDF::Node then RDF::Node("z")
+              else t
+              end
+            end
+            RDF::Statement.from(quad).to_nquads
+          end
+        log_debug("log point", "Hash First Degree Quads function (4.7.3).")
+        log_debug("nquads:")
+        nquads.each do |q|
+          log_debug {"  - #{q.strip}"}
+        end
+        result = hexdigest(nquads.sort.join)
+        log_debug("hash") {result}
+        result
+      end
+      # @param [RDF::Node] related
+      # @param [RDF::Statement] statement
+      # @param [IdentifierIssuer] issuer
+      # @param [String] position one of :s, :o, or :g
+      # @return [String] the SHA256 hexdigest hash
+      def hash_related_node(related, statement, issuer, position)
+        log_debug("related") {related.id}
+        input = "#{position}"
+        input << statement.predicate.to_ntriples unless position == :g
+        if identifier = (canonical_issuer.identifier(related) ||
+                         issuer.identifier(related))
+          input << "_:#{identifier}"
+        else
+          log_debug("h1dq:")
+          input << log_depth(depth: 2) do
+            hash_first_degree_quads(related)
+          end
+        end
+        log_debug("input") {input.inspect}
+        log_debug("hash") {hexdigest(input)}
+        hexdigest(input)
+      end
+      # @param [RDF::Node] identifier
+      # @param [IdentifierIssuer] issuer
+      # @return [Array<String,IdentifierIssuer>] the Hash and issuer
+      def hash_n_degree_quads(identifier, issuer)
+        log_debug("hndq:")
+        log_debug("  log point", "Hash N-Degree Quads function (4.9.3).")
+        log_debug("  identifier") {identifier.id}
+        log_debug("  issuer") {issuer.inspect}
+        # hash to related blank nodes map
+        hn = {}
+        log_debug("  hndq.2:")
+        log_debug("    log point", "Quads for identifier (4.9.3 (2)).")
+        log_debug("    quads:")
+        bnode_to_statements[identifier].each do |s|
+          log_debug {"    - #{s.to_nquads.strip}"}
+        end
+        # Step 3
+        log_debug("  hndq.3:")
+        log_debug("    log point", "Hash N-Degree Quads function (4.9.3 (3)).")
+        log_debug("    with:") unless bnode_to_statements[identifier].empty?
+        bnode_to_statements[identifier].each do |statement|
+          log_debug {"      - quad: #{statement.to_nquads.strip}"}
+          log_debug("        hndq.3.1:")
+          log_debug("          log point", "Hash related bnode component (4.9.3 (3.1))")
+          log_depth(depth: 10) {hash_related_statement(identifier, statement, issuer, hn)}
+        end
+        log_debug("    Hash to bnodes:")
+        hn.each do |k,v|
+          log_debug("      #{k}:")
+          v.each do |vv|
+            log_debug("        - #{vv.id}")
+          end
+        end
+        data_to_hash = ""
+        # Step 5
+        log_debug("  hndq.5:")
+        log_debug("    log point", "Hash N-Degree Quads function (4.9.3 (5)), entering loop.")
+        log_debug("    with:")
+        hn.keys.sort.each do |hash|
+          log_debug("      - related hash", hash)
+          log_debug("        data to hash") {data_to_hash.to_json}
+          list = hn[hash]
+          # Iterate over related nodes
+          chosen_path, chosen_issuer = "", nil
+          data_to_hash += hash
+          log_debug("        hndq.5.4:")
+          log_debug("          log point", "Hash N-Degree Quads function (4.9.3 (5.4)), entering loop.")
+          log_debug("          with:") unless list.empty?
+          list.permutation do |permutation|
+            log_debug("          - perm") {permutation.map(&:id).to_json(indent: ' ', space: ' ')}
+            issuer_copy, path, recursion_list = issuer.dup, "", []
+            log_debug("            hndq.5.4.4:")
+            log_debug("              log point", "Hash N-Degree Quads function (4.9.3 (5.4.4)), entering loop.")
+            log_debug("              with:")
+            permutation.each do |related|
+              log_debug("                - related") {related.id}
+              log_debug("                  path") {path.to_json}
+              if canonical_issuer.identifier(related)
+                path << '_:' + canonical_issuer.issue_identifier(related)
+              else
+                recursion_list << related if !issuer_copy.identifier(related)
+                path << '_:' + issuer_copy.issue_identifier(related)
+              end
+              # Skip to the next permutation if chosen path isn't empty and the path is greater than the chosen path
+              break if !chosen_path.empty? && path.length >= chosen_path.length
+            end
+            log_debug("            hndq.5.4.5:")
+            log_debug("              log point", "Hash N-Degree Quads function (4.9.3 (5.4.5)), before possible recursion.")
+            log_debug("              recursion list") {recursion_list.map(&:id).to_json(indent: ' ')}
+            log_debug("              path") {path.to_json}
+            log_debug("              with:") unless recursion_list.empty?
+            recursion_list.each do |related|
+              log_debug("                - related") {related.id}
+              result = log_depth(depth: 18) {hash_n_degree_quads(related, issuer_copy)}
+              path << '_:' + issuer_copy.issue_identifier(related)
+              path << "<#{result.first}>"
+              issuer_copy = result.last
+              log_debug("                  hndq.5.4.5.4:")
+              log_debug("                    log point", "Hash N-Degree Quads function (4.9.3 (5.4.5.4)), combine result of recursion.")
+              log_debug("                    path") {path.to_json}
+              log_debug("                    issuer copy") {issuer_copy.inspect}
+              break if !chosen_path.empty? && path.length >= chosen_path.length && path > chosen_path
+            end
+            if chosen_path.empty? || path < chosen_path
+              chosen_path, chosen_issuer = path, issuer_copy
+            end
+          end
+          data_to_hash += chosen_path
+          log_debug("        hndq.5.5:")
+          log_debug("          log point", "Hash N-Degree Quads function (4.9.3 (5.5). End of current loop with Hn hashes.")
+          log_debug("          chosen path") {chosen_path.to_json}
+          log_debug("          data to hash") {data_to_hash.to_json}
+          issuer = chosen_issuer
+        end
+        log_debug("  hndq.6:")
+        log_debug("    log point", "Leaving Hash N-Degree Quads function (4.9.3).")
+        log_debug("    hash") {hexdigest(data_to_hash)}
+        log_depth(depth: 4) {log_debug("issuer") {issuer.inspect}}
+        return [hexdigest(data_to_hash), issuer]
+      end
+      def inspect
+        "NormalizationState:\nbnode_to_statements: #{inspect_bnode_to_statements}\nhash_to_bnodes: #{inspect_hash_to_bnodes}\ncanonical_issuer: #{canonical_issuer.inspect}"
+      end
+      def inspect_bnode_to_statements
+        bnode_to_statements.map do |n, statements|
+          "#{n.id}: #{statements.map {|s| s.to_nquads.strip}}"
+        end.join(", ")
+      end
+      def inspect_hash_to_bnodes
+      end
+      protected
+      def hexdigest(val)
+        Digest::SHA256.hexdigest(val)
+      end
+      # Group adjacent bnodes by hash
+      def hash_related_statement(identifier, statement, issuer, map)
+        log_debug("with:") if statement.to_h.values.any? {|t| t.is_a?(RDF::Node)}
+        statement.to_h(:s, :p, :o, :g).each do |pos, term|
+          next if !term.is_a?(RDF::Node) || term == identifier
+          log_debug("  - position", pos)
+          hash = log_depth(depth: 4) {hash_related_node(term, statement, issuer, pos)}
+          map[hash] ||= []
+          map[hash] << term unless map[hash].any? {|n| n.eql?(term)}
+        end
+      end
+    end
+    class IdentifierIssuer
+      def initialize(prefix = "c14n")
+        @prefix, @counter, @issued = prefix, 0, {}
+      end
+      # Return an identifier for this BNode
+      # @param [RDF::Node] node
+      # @return [String] Canonical identifier for node
+      def issue_identifier(node)
+        @issued[node] ||= begin
+          res, @counter = @prefix + @counter.to_s, @counter + 1
+          res
+        end
+      end
+      def issued
+        @issued.keys
+      end
+      # @return [RDF::Node] Canonical identifier assigned to node
+      def identifier(node)
+        @issued[node]
+      end
+      # Duplicate this issuer, ensuring that the issued identifiers remain distinct
+      # @return [IdentifierIssuer]
+      def dup
+        other = super
+        other.instance_variable_set(:@issued, @issued.dup)
+        other
+      end
+      def inspect
+        "{#{@issued.map {|k,v| "#{k.id}: #{v}"}.join(', ')}}"
+      end
+    end
+  end
+end

data/lib/rdf/normalize/urgna2012.rb CHANGED Viewed

@@ -1,12 +1,12 @@
 module RDF::Normalize
-  class URGNA2012 < URDNA2015
+  class URGNA2012 < RDFC10
     def each(&block)
       ns = NormalizationState.new(@options)
       normalize_statements(ns, &block)
     end
-    class NormalizationState < URDNA2015::NormalizationState
+    class NormalizationState < RDFC10::NormalizationState
       protected
       # 2012 version uses SHA-1
@@ -23,9 +23,7 @@ module RDF::Normalize
         identifier = canonical_issuer.identifier(related) ||
                      issuer.identifier(related) ||
                      hash_first_degree_quads(related)
-        input = position.to_s
-        input << statement.predicate.to_s
-        input << identifier
+        input = "#{position}#{statement.predicate}#{identifier}"
         log_debug("hrel") {"input: #{input.inspect}, hash: #{hexdigest(input)}"}
         hexdigest(input)
       end
@@ -35,11 +33,11 @@ module RDF::Normalize
         if statement.subject.node? && statement.subject != identifier
           hash = log_depth {hash_related_node(statement.subject, statement, issuer, :p)}
           map[hash] ||= []
-          map[hash] << statement.subject unless map[hash].include?(statement.subject)
+          map[hash] << statement.subject unless map[hash].any? {|n| n.eql?(statement.subject)}
         elsif statement.object.node? && statement.object != identifier
           hash = log_depth {hash_related_node(statement.object, statement, issuer, :r)}
           map[hash] ||= []
-          map[hash] << statement.object unless map[hash].include?(statement.object)
+          map[hash] << statement.object unless map[hash].any? {|n| n.eql?(statement.object)}
         end
       end
     end

data/lib/rdf/normalize/version.rb CHANGED Viewed

@@ -1,5 +1,5 @@
 module RDF::Normalize::VERSION
-  VERSION_FILE = File.join(File.expand_path(File.dirname(__FILE__)), "..", "..", "..", "VERSION")
+  VERSION_FILE = File.expand_path("../../../../VERSION", __FILE__)
   MAJOR, MINOR, TINY, EXTRA = File.read(VERSION_FILE).chop.split(".")
   STRING = [MAJOR, MINOR, TINY, EXTRA].compact.join('.')

data/lib/rdf/normalize/writer.rb CHANGED Viewed

@@ -4,7 +4,7 @@ module RDF::Normalize
   #
   # Normalizes the enumerated statements into normal form in the form of N-Quads.
   #
-  # @author [Gregg Kellogg](http://greggkellogg.net/)
+  # @author [Gregg Kellogg](https://greggkellogg.net/)
   class Writer < RDF::NQuads::Writer
     format RDF::Normalize::Format
@@ -53,7 +53,7 @@ module RDF::Normalize
     #
     # @return [void]
     def write_epilogue
-      statements = RDF::Normalize.new(@repo, **@options).
+      RDF::Normalize.new(@repo, **@options).
         statements.
         reject(&:variable?).
         map {|s| format_statement(s)}.

data/lib/rdf/normalize.rb CHANGED Viewed

@@ -1,4 +1,5 @@
 require 'rdf'
+require 'digest'
 module RDF
   ##
@@ -25,13 +26,13 @@ module RDF
   #     writer << RDF::Repository.load("etc/doap.ttl")
   #   end
   #
-  # @author [Gregg Kellogg](http://greggkellogg.net/)
+  # @author [Gregg Kellogg](https://greggkellogg.net/)
   module Normalize
     require  'rdf/normalize/format'
     autoload :Base,       'rdf/normalize/base'
     autoload :Carroll2001,'rdf/normalize/carroll2001'
     autoload :URGNA2012,  'rdf/normalize/urgna2012'
-    autoload :URDNA2015,  'rdf/normalize/urdna2015'
+    autoload :RDFC10,     'rdf/normalize/rdfc10'
     autoload :VERSION,    'rdf/normalize/version'
     autoload :Writer,     'rdf/normalize/writer'
@@ -42,19 +43,19 @@ module RDF
     ALGORITHMS = {
       carroll2001: :Carroll2001,
       urgna2012:   :URGNA2012,
-      urdna2015:   :URDNA2015
+      rdfc10:   :RDFC10
     }.freeze
     ##
     # Creates a new normalizer instance using either the specified or default normalizer algorithm
     # @param [RDF::Enumerable] enumerable
     # @param [Hash{Symbol => Object}] options
-    # @option options [Base] :algorithm (:urdna2015)
-    #   One of `:carroll2001`, `:urgna2012`, or `:urdna2015`
+    # @option options [Base] :algorithm (:rdfc10)
+    #   One of `:carroll2001`, `:urgna2012`, or `:rdfc10`
     # @return [RDF::Normalize::Base]
     # @raise [ArgumentError] selected algorithm not defined
     def new(enumerable, **options)
-      algorithm = options.fetch(:algorithm, :urdna2015)
+      algorithm = options.fetch(:algorithm, :rdfc10)
       raise ArgumentError, "No algoritm defined for #{algorithm.to_sym}" unless ALGORITHMS.has_key?(algorithm)
       algorithm_class = const_get(ALGORITHMS[algorithm])
       algorithm_class.new(enumerable, **options)

metadata CHANGED Viewed

@@ -1,14 +1,14 @@
 --- !ruby/object:Gem::Specification
 name: rdf-normalize
 version: !ruby/object:Gem::Version
-  version: 0.5.0
+  version: 0.6.0
 platform: ruby
 authors:
 - Gregg Kellogg
 autorequire:
 bindir: bin
 cert_chain: []
-date: 2021-12-07 00:00:00.000000000 Z
+date: 2023-06-10 00:00:00.000000000 Z
 dependencies:
 - !ruby/object:Gem::Dependency
   name: rdf
@@ -53,21 +53,21 @@ dependencies:
       - !ruby/object:Gem::Version
         version: '3.10'
 - !ruby/object:Gem::Dependency
-  name: webmock
+  name: json-ld
   requirement: !ruby/object:Gem::Requirement
     requirements:
     - - "~>"
       - !ruby/object:Gem::Version
-        version: '3.11'
+        version: '3.2'
   type: :development
   prerelease: false
   version_requirements: !ruby/object:Gem::Requirement
     requirements:
     - - "~>"
       - !ruby/object:Gem::Version
-        version: '3.11'
+        version: '3.2'
 - !ruby/object:Gem::Dependency
-  name: json-ld
+  name: rdf-trig
   requirement: !ruby/object:Gem::Requirement
     requirements:
     - - "~>"
@@ -94,7 +94,7 @@ dependencies:
     - - "~>"
       - !ruby/object:Gem::Version
         version: '0.9'
-description: RDF::Normalize is a Graph normalizer for the RDF.rb library suite.
+description: RDF::Normalize performs Dataset Canonicalization for RDF.rb.
 email: public-rdf-ruby@w3.org
 executables: []
 extensions: []
@@ -108,14 +108,19 @@ files:
 - lib/rdf/normalize/base.rb
 - lib/rdf/normalize/carroll2001.rb
 - lib/rdf/normalize/format.rb
-- lib/rdf/normalize/urdna2015.rb
+- lib/rdf/normalize/rdfc10.rb
 - lib/rdf/normalize/urgna2012.rb
 - lib/rdf/normalize/version.rb
 - lib/rdf/normalize/writer.rb
 homepage: https://github.com/ruby-rdf/rdf-normalize
 licenses:
 - Unlicense
-metadata: {}
+metadata:
+  documentation_uri: https://ruby-rdf.github.io/rdf-normalize
+  bug_tracker_uri: https://github.com/ruby-rdf/rdf-normalize/issues
+  homepage_uri: https://github.com/ruby-rdf/rdf-normalize
+  mailing_list_uri: https://lists.w3.org/Archives/Public/public-rdf-ruby/
+  source_code_uri: https://github.com/ruby-rdf/rdf-normalize
 post_install_message:
 rdoc_options: []
 require_paths:
@@ -131,7 +136,7 @@ required_rubygems_version: !ruby/object:Gem::Requirement
     - !ruby/object:Gem::Version
       version: '0'
 requirements: []
-rubygems_version: 3.3.3
+rubygems_version: 3.4.13
 signing_key:
 specification_version: 4
 summary: RDF Graph normalizer for Ruby.

data/lib/rdf/normalize/urdna2015.rb DELETED Viewed

@@ -1,263 +0,0 @@
-module RDF::Normalize
-  class URDNA2015
-    include RDF::Enumerable
-    include RDF::Util::Logger
-    include Base
-    ##
-    # Create an enumerable with grounded nodes
-    #
-    # @param [RDF::Enumerable] enumerable
-    # @return [RDF::Enumerable]
-    def initialize(enumerable, **options)
-      @dataset, @options = enumerable, options
-    end
-    def each(&block)
-      ns = NormalizationState.new(@options)
-      normalize_statements(ns, &block)
-    end
-    protected
-    def normalize_statements(ns, &block)
-      # Map BNodes to the statements they are used by
-      dataset.each_statement do |statement|
-        statement.to_quad.compact.select(&:node?).each do |node|
-          ns.add_statement(node, statement)
-        end
-      end
-      non_normalized_identifiers, simple = ns.bnode_to_statements.keys, true
-      while simple
-        simple = false
-        ns.hash_to_bnodes = {}
-        # Calculate hashes for first degree nodes
-        non_normalized_identifiers.each do |node|
-          hash = log_depth {ns.hash_first_degree_quads(node)}
-          log_debug("1deg") {"hash: #{hash}"}
-          ns.add_bnode_hash(node, hash)
-        end
-        # Create canonical replacements for hashes mapping to a single node
-        ns.hash_to_bnodes.keys.sort.each do |hash|
-          identifier_list = ns.hash_to_bnodes[hash]
-          next if identifier_list.length > 1
-          node = identifier_list.first
-          id = ns.canonical_issuer.issue_identifier(node)
-          log_debug("single node") {"node: #{node.to_ntriples}, hash: #{hash}, id: #{id}"}
-          non_normalized_identifiers -= identifier_list
-          ns.hash_to_bnodes.delete(hash)
-          simple = true
-        end
-      end
-      # Iterate over hashs having more than one node
-      ns.hash_to_bnodes.keys.sort.each do |hash|
-        identifier_list = ns.hash_to_bnodes[hash]
-        log_debug("multiple nodes") {"node: #{identifier_list.map(&:to_ntriples).join(",")}, hash: #{hash}"}
-        hash_path_list = []
-        # Create a hash_path_list for all bnodes using a temporary identifier used to create canonical replacements
-        identifier_list.each do |identifier|
-          next if ns.canonical_issuer.issued.include?(identifier)
-          temporary_issuer = IdentifierIssuer.new("_:b")
-          temporary_issuer.issue_identifier(identifier)
-          hash_path_list << log_depth {ns.hash_n_degree_quads(identifier, temporary_issuer)}
-        end
-        log_debug("->") {"hash_path_list: #{hash_path_list.map(&:first).inspect}"}
-        # Create canonical replacements for nodes
-        hash_path_list.sort_by(&:first).map(&:last).each do |issuer|
-          issuer.issued.each do |node|
-            id = ns.canonical_issuer.issue_identifier(node)
-            log_debug("-->") {"node: #{node.to_ntriples}, id: #{id}"}
-          end
-        end
-      end
-      # Yield statements using BNodes from canonical replacements
-      dataset.each_statement do |statement|
-        if statement.has_blank_nodes?
-          quad = statement.to_quad.compact.map do |term|
-            term.node? ? RDF::Node.intern(ns.canonical_issuer.identifier(term)[2..-1]) : term
-          end
-          block.call RDF::Statement.from(quad)
-        else
-          block.call statement
-        end
-      end
-    end
-  private
-    class NormalizationState
-      include RDF::Util::Logger
-      attr_accessor :bnode_to_statements
-      attr_accessor :hash_to_bnodes
-      attr_accessor :canonical_issuer
-      def initialize(options)
-        @options = options
-        @bnode_to_statements, @hash_to_bnodes, @canonical_issuer = {}, {}, IdentifierIssuer.new("_:c14n")
-      end
-      def add_statement(node, statement)
-        bnode_to_statements[node] ||= []
-        bnode_to_statements[node] << statement unless bnode_to_statements[node].include?(statement)
-      end
-      def add_bnode_hash(node, hash)
-        hash_to_bnodes[hash] ||= []
-        hash_to_bnodes[hash] << node unless hash_to_bnodes[hash].include?(node)
-      end
-      # @param [RDF::Node] node
-      # @return [String] the SHA256 hexdigest hash of statements using this node, with replacements
-      def hash_first_degree_quads(node)
-        quads = bnode_to_statements[node].
-          map do |statement|
-            quad = statement.to_quad.map do |t|
-              case t
-              when node then RDF::Node("a")
-              when RDF::Node then RDF::Node("z")
-              else t
-              end
-            end
-            RDF::NQuads::Writer.serialize(RDF::Statement.from(quad))
-          end
-        log_debug("1deg") {"node: #{node}, quads: #{quads}"}
-        hexdigest(quads.sort.join)
-      end
-      # @param [RDF::Node] related
-      # @param [RDF::Statement] statement
-      # @param [IdentifierIssuer] issuer
-      # @param [String] position one of :s, :o, or :g
-      # @return [String] the SHA256 hexdigest hash
-      def hash_related_node(related, statement, issuer, position)
-        identifier = canonical_issuer.identifier(related) ||
-                     issuer.identifier(related) ||
-                     hash_first_degree_quads(related)
-        input = position.to_s
-        input << statement.predicate.to_ntriples unless position == :g
-        input << identifier
-        log_debug("hrel") {"input: #{input.inspect}, hash: #{hexdigest(input)}"}
-        hexdigest(input)
-      end
-      # @param [RDF::Node] identifier
-      # @param [IdentifierIssuer] issuer
-      # @return [Array<String,IdentifierIssuer>] the Hash and issuer
-      def hash_n_degree_quads(identifier, issuer)
-        log_debug("ndeg") {"identifier: #{identifier.to_ntriples}"}
-        # hash to related blank nodes map
-        map = {}
-        bnode_to_statements[identifier].each do |statement|
-          hash_related_statement(identifier, statement, issuer, map)
-        end
-        data_to_hash = ""
-        log_debug("ndeg") {"map: #{map.map {|h,l| "#{h}: #{l.map(&:to_ntriples)}"}.join('; ')}"}
-        log_depth do
-          map.keys.sort.each do |hash|
-            list = map[hash]
-            # Iterate over related nodes
-            chosen_path, chosen_issuer = "", nil
-            data_to_hash += hash
-            list.permutation do |permutation|
-              log_debug("ndeg") {"perm: #{permutation.map(&:to_ntriples).join(",")}"}
-              issuer_copy, path, recursion_list = issuer.dup, "", []
-              permutation.each do |related|
-                if canonical_issuer.identifier(related)
-                  path << canonical_issuer.issue_identifier(related)
-                else
-                  recursion_list << related if !issuer_copy.identifier(related)
-                  path << issuer_copy.issue_identifier(related)
-                end
-                # Skip to the next permutation if chosen path isn't empty and the path is greater than the chosen path
-                break if !chosen_path.empty? && path.length >= chosen_path.length
-              end
-              log_debug("ndeg") {"hash: #{hash}, path: #{path}, recursion: #{recursion_list.map(&:to_ntriples)}"}
-              recursion_list.each do |related|
-                result = log_depth {hash_n_degree_quads(related, issuer_copy)}
-                path << issuer_copy.issue_identifier(related)
-                path << "<#{result.first}>"
-                issuer_copy = result.last
-                break if !chosen_path.empty? && path.length >= chosen_path.length && path > chosen_path
-              end
-              if chosen_path.empty? || path < chosen_path
-                chosen_path, chosen_issuer = path, issuer_copy
-              end
-            end
-            data_to_hash += chosen_path
-            issuer = chosen_issuer
-          end
-        end
-        log_debug("ndeg") {"datatohash: #{data_to_hash.inspect}, hash: #{hexdigest(data_to_hash)}"}
-        return [hexdigest(data_to_hash), issuer]
-      end
-      protected
-      def hexdigest(val)
-        Digest::SHA256.hexdigest(val)
-      end
-      # Group adjacent bnodes by hash
-      def hash_related_statement(identifier, statement, issuer, map)
-        statement.to_h(:s, :p, :o, :g).each do |pos, term|
-          next if !term.is_a?(RDF::Node) || term == identifier
-          hash = log_depth {hash_related_node(term, statement, issuer, pos)}
-          map[hash] ||= []
-          map[hash] << term unless map[hash].include?(term)
-        end
-      end
-    end
-    class IdentifierIssuer
-      def initialize(prefix = "_:c14n")
-        @prefix, @counter, @issued = prefix, 0, {}
-      end
-      # Return an identifier for this BNode
-      def issue_identifier(node)
-        @issued[node] ||= begin
-          res, @counter = @prefix + @counter.to_s, @counter + 1
-          res
-        end
-      end
-      def issued
-        @issued.keys
-      end
-      def identifier(node)
-        @issued[node]
-      end
-      # Duplicate this issuer, ensuring that the issued identifiers remain distinct
-      # @return [IdentifierIssuer]
-      def dup
-        other = super
-        other.instance_variable_set(:@issued, @issued.dup)
-        other
-      end
-    end
-  end
-end