rdf-normalize 0.1.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml ADDED
@@ -0,0 +1,7 @@
1
+ ---
2
+ SHA1:
3
+ metadata.gz: d7076dcfeccdbfc0b35ec046d0b338a6ad41d776
4
+ data.tar.gz: cd5f278797b575a3a6cced04890b9014c2350f42
5
+ SHA512:
6
+ metadata.gz: 2510cec72f19af6eef55678382f688a5948b59fad4c6be18465c53a66d16b25bb543dadbd5a5148675d291afac133c8cc7d5399650ad9043b858c1a1f6165291
7
+ data.tar.gz: f785bd00b4abacf7da181daae96e2101aa449b19791a08742654451cf9a3b25abdf368dd77d99a86c4f74a49084b2d9af6464ea30bbed45589f87713e899ee63
data/AUTHORS ADDED
@@ -0,0 +1 @@
1
+ * Gregg Kellogg <gregg@greggkellogg.net>
data/LICENSE ADDED
@@ -0,0 +1,25 @@
1
+ This is free and unencumbered software released into the public domain.
2
+
3
+ Anyone is free to copy, modify, publish, use, compile, sell, or
4
+ distribute this software, either in source code form or as a compiled
5
+ binary, for any purpose, commercial or non-commercial, and by any
6
+ means.
7
+
8
+ In jurisdictions that recognize copyright laws, the author or authors
9
+ of this software dedicate any and all copyright interest in the
10
+ software to the public domain. We make this dedication for the benefit
11
+ of the public at large and to the detriment of our heirs and
12
+ successors. We intend this dedication to be an overt act of
13
+ relinquishment in perpetuity of all present and future rights to this
14
+ software under copyright law.
15
+
16
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
17
+ EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
18
+ MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.
19
+ IN NO EVENT SHALL THE AUTHORS BE LIABLE FOR ANY CLAIM, DAMAGES OR
20
+ OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
21
+ ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
22
+ OTHER DEALINGS IN THE SOFTWARE.
23
+
24
+ For more information, please refer to <http://unlicense.org>
25
+
data/README.md ADDED
@@ -0,0 +1,82 @@
1
+ # RDF::Normalize
2
+ RDF Graph normalizer for [RDF.rb][RDF.rb].
3
+
4
+ [![Gem Version](https://badge.fury.io/rb/rdf-normalize.png)](http://badge.fury.io/rb/rdf-normalize)
5
+ [![Build Status](https://secure.travis-ci.org/ruby-rdf/rdf-normalize.png?branch=master)](http://travis-ci.org/ruby-rdf/rdf-normalize)
6
+
7
+ ## Description
8
+ This is a [Ruby][] implementation of a [RDF Normalize][] for [RDF.rb][].
9
+
10
+ ## Features
11
+ RDF::Normalize generates normalized [N-Quads][] output for an RDF Dataset using the algorithm
12
+ defined in [RDF Normalize][]. It also implements an RDF Writer interface, which can be used
13
+ to serialize normalized statements.
14
+
15
+ Algorithms implemented:
16
+
17
+ * [URGNA2012](http://json-ld.github.io/normalization/spec/index.html#dfn-urgna2012)
18
+ * [URDNA2014](http://json-ld.github.io/normalization/spec/index.html#dfn-urdna2015)
19
+
20
+ Install with `gem install rdf-normalize`
21
+
22
+ * 100% free and unencumbered [public domain](http://unlicense.org/) software.
23
+ * Compatible with Ruby >= 1.9.3.
24
+
25
+ ## Usage
26
+
27
+ ## Documentation
28
+ Full documentation available on [Rubydoc.info][Normalize doc]
29
+
30
+ ### Principle Classes
31
+ * {RDF::Normalize}
32
+ * {RDF::Normalize::Base}
33
+ * {RDF::Normalize::Format}
34
+ * {RDF::Normalize::Writer}
35
+ * {RDF::Normalize::URGNA2012}
36
+ * {RDF::Normalize::URDNA2015}
37
+
38
+
39
+ ## Dependencies
40
+
41
+ * [Ruby](http://ruby-lang.org/) (>= 1.9.2)
42
+ * [RDF.rb](http://rubygems.org/gems/rdf) (~> 1.1)
43
+
44
+ ## Installation
45
+
46
+ The recommended installation method is via [RubyGems](http://rubygems.org/).
47
+ To install the latest official release of the `RDF::Normalize` gem, do:
48
+
49
+ % [sudo] gem install rdf-normalize
50
+
51
+ ## Mailing List
52
+ * <http://lists.w3.org/Archives/Public/public-rdf-ruby/>
53
+
54
+ ## Author
55
+ * [Gregg Kellogg](http://github.com/gkellogg) - <http://greggkellogg.net/>
56
+
57
+ ## Contributing
58
+ * Do your best to adhere to the existing coding conventions and idioms.
59
+ * Don't use hard tabs, and don't leave trailing whitespace on any line.
60
+ * Do document every method you add using [YARD][] annotations. Read the
61
+ [tutorial][YARD-GS] or just look at the existing code for examples.
62
+ * Don't touch the `.gemspec`, `VERSION` or `AUTHORS` files. If you need to
63
+ change them, do so on your private branch only.
64
+ * Do feel free to add yourself to the `CREDITS` file and the corresponding
65
+ list in the the `README`. Alphabetical order applies.
66
+ * Do note that in order for us to merge any non-trivial changes (as a rule
67
+ of thumb, additions larger than about 15 lines of code), we need an
68
+ explicit [public domain dedication][PDD] on record from you.
69
+
70
+ ## License
71
+ This is free and unencumbered public domain software. For more information,
72
+ see <http://unlicense.org/> or the accompanying {file:LICENSE} file.
73
+
74
+ [Ruby]: http://ruby-lang.org/
75
+ [RDF]: http://www.w3.org/RDF/
76
+ [YARD]: http://yardoc.org/
77
+ [YARD-GS]: http://rubydoc.info/docs/yard/file/docs/GettingStarted.md
78
+ [PDD]: http://lists.w3.org/Archives/Public/public-rdf-ruby/2010May/0013.html
79
+ [RDF.rb]: http://rubydoc.info/github/ruby-rdf/rdf-normalize
80
+ [N-Triples]: http://www.w3.org/TR/rdf-testcases/#ntriples
81
+ [RDF Normalize]:http://json-ld.github.io/normalization/spec/
82
+ [Normalize doc]:http://rubydoc.info/github/ruby-rdf/rdf-normalize/master/file/README.markdown
data/VERSION ADDED
@@ -0,0 +1 @@
1
+ 0.1.0
@@ -0,0 +1,66 @@
1
+ require 'rdf'
2
+
3
+ module RDF
4
+ ##
5
+ # **`RDF::Normalize`** is an RDF Graph normalization plugin for RDF.rb.
6
+ #
7
+ # @example Requiring the `RDF::Normalize` module
8
+ # require 'rdf/normalize'
9
+ #
10
+ # @example Returning an iterator for normalized statements
11
+ #
12
+ # g = RDF::Graph.load("etc/doap.ttl")
13
+ # RDF::Normalize.new(g).each_statement do |statement
14
+ # puts statement.inspect
15
+ # end
16
+ #
17
+ # @example Returning normalized N-Quads
18
+ #
19
+ # g = RDF::Graph.load("etc/doap.ttl")
20
+ # g.dump(:normalize)
21
+ #
22
+ # @example Writing a repository as normalized N-Quads
23
+ #
24
+ # RDF::Normalize::Writer.open("etc/doap.nq") do |writer|
25
+ # writer << RDF::Repository.load("etc/doap.ttl")
26
+ # end
27
+ #
28
+ # @author [Gregg Kellogg](http://greggkellogg.net/)
29
+ module Normalize
30
+ require 'rdf/normalize/format'
31
+ require 'rdf/normalize/utils'
32
+ autoload :Base, 'rdf/normalize/base'
33
+ autoload :Carroll2001,'rdf/normalize/carroll2001'
34
+ autoload :URGNA2012, 'rdf/normalize/urgna2012'
35
+ autoload :URDNA2015, 'rdf/normalize/urdna2015'
36
+ autoload :VERSION, 'rdf/normalize/version'
37
+ autoload :Writer, 'rdf/normalize/writer'
38
+
39
+ # Enumerable to normalize
40
+ # @return [RDF::Enumerable]
41
+ attr_accessor :dataset
42
+
43
+ ALGORITHMS = {
44
+ carroll2001: :Carroll2001,
45
+ urgna2012: :URGNA2012,
46
+ urdna2015: :URDNA2015
47
+ }.freeze
48
+
49
+ ##
50
+ # Creates a new normalizer instance using either the specified or default normalizer algorithm
51
+ # @param [RDF::Enumerable] enumerable
52
+ # @param [Hash{Symbol => Object}] options
53
+ # @option options [Base] :algorithm (:urdna2015)
54
+ # One of `:carroll2001`, `:urgna2012`, or `:urdna2015`
55
+ # @return [RDF::Normalize::Base]
56
+ # @raise [ArgumentError] selected algorithm not defined
57
+ def new(enumerable, options = {})
58
+ algorithm = options.fetch(:algorithm, :urdna2015)
59
+ raise ArgumentError, "No algoritm defined for #{algorithm.to_sym}" unless ALGORITHMS.has_key?(algorithm)
60
+ algorithm_class = const_get(ALGORITHMS[algorithm])
61
+ algorithm_class.new(enumerable, options)
62
+ end
63
+ module_function :new
64
+
65
+ end
66
+ end
@@ -0,0 +1,15 @@
1
+ module RDF::Normalize
2
+ ##
3
+ # Abstract class for pluggable normalization algorithms. Delegates to a default or selected algorithm if instantiated
4
+ module Base
5
+ attr_reader :dataset
6
+
7
+ # Enumerates normalized statements
8
+ #
9
+ # @yield statement
10
+ # @yieldparam [RDF::Statement] statement
11
+ def each(&block)
12
+ raise "Not Implemented"
13
+ end
14
+ end
15
+ end
@@ -0,0 +1,166 @@
1
+ module RDF::Normalize
2
+ class Carroll2001
3
+ include RDF::Enumerable
4
+ include Base
5
+ include Utils
6
+
7
+ ##
8
+ # Create an enumerable with grounded nodes
9
+ #
10
+ # @param [RDF::Enumerable] enumerable
11
+ # @return [RDF::Enumerable]
12
+ def initialize(enumerable, options)
13
+ @dataset = enumerable
14
+ end
15
+
16
+ def each(&block)
17
+ ground_statements, anon_statements = [], []
18
+ dataset.each_statement do |statement|
19
+ (statement.has_blank_nodes? ? anon_statements : ground_statements) << statement
20
+ end
21
+
22
+ nodes = anon_statements.map(&:to_quad).flatten.compact.select(&:node?).uniq
23
+
24
+ # Create a hash signature of every node, based on the signature of
25
+ # statements it exists in.
26
+ # We also save hashes of nodes that cannot be reliably known; we will use
27
+ # that information to eliminate possible recursion combinations.
28
+ #
29
+ # Any mappings given in the method parameters are considered grounded.
30
+ hashes, ungrounded_hashes = hash_nodes(anon_statements, nodes, {})
31
+
32
+ # FIXME: likely need to iterate until hashes and ungrounded_hashes are the same size
33
+ while hashes.size != ungrounded_hashes.size
34
+ raise "Not done"
35
+ end
36
+
37
+ # Enumerate all statements, replacing nodes with new ground nodes using the hash as an identifier
38
+ ground_statements.each(&block)
39
+ anon_statements.each do |statement|
40
+ quad = statement.to_quad.compact.map do |term|
41
+ term.node? ? RDF::Node.intern(hashes[term]) : term
42
+ end
43
+ block.call RDF::Statement.from(quad)
44
+ end
45
+ end
46
+
47
+ private
48
+
49
+ # Given a set of statements, create a mapping of node => SHA1 for a given
50
+ # set of blank nodes.
51
+ #
52
+ # Returns a tuple of hashes: one of grounded hashes, and one of all
53
+ # hashes. grounded hashes are based on non-blank nodes and grounded blank
54
+ # nodes, and can be used to determine if a node's signature matches
55
+ # another.
56
+ #
57
+ # @param [Array] statements
58
+ # @param [Array] nodes
59
+ # @param [Hash] grounded_hashes
60
+ # mapping of node => SHA1 pairs as input, used to create more specific signatures of other nodes.
61
+ # @private
62
+ # @return [Hash, Hash]
63
+ def hash_nodes(statements, nodes, grounded_hashes)
64
+ hashes = grounded_hashes.dup
65
+ ungrounded_hashes = {}
66
+ hash_needed = true
67
+
68
+ # We may have to go over the list multiple times. If a node is marked as
69
+ # grounded, other nodes can then use it to decide their own state of
70
+ # grounded.
71
+ while hash_needed
72
+ starting_grounded_nodes = hashes.size
73
+ nodes.each do | node |
74
+ unless hashes.member? node
75
+ grounded, hash = node_hash_for(node, statements, hashes)
76
+ if grounded
77
+ hashes[node] = hash
78
+ end
79
+ ungrounded_hashes[node] = hash
80
+ end
81
+ end
82
+
83
+ # after going over the list, any nodes with a unique hash can be marked
84
+ # as grounded, even if we have not tied them back to a root yet.
85
+ uniques = {}
86
+ ungrounded_hashes.each do |node, hash|
87
+ uniques[hash] = uniques.has_key?(hash) ? false : node
88
+ end
89
+ uniques.each do |hash, node|
90
+ hashes[node] = hash if node
91
+ end
92
+ hash_needed = starting_grounded_nodes != hashes.size
93
+ end
94
+ [hashes, ungrounded_hashes]
95
+ end
96
+
97
+ # Generate a hash for a node based on the signature of the statements it
98
+ # appears in. Signatures consist of grounded elements in statements
99
+ # associated with a node, that is, anything but an ungrounded anonymous
100
+ # node. Creating the hash is simply hashing a sorted list of each
101
+ # statement's signature, which is itself a concatenation of the string form
102
+ # of all grounded elements.
103
+ #
104
+ # Nodes other than the given node are considered grounded if they are a
105
+ # member in the given hash.
106
+ #
107
+ # @param [RDF::Node] node
108
+ # @param [Array<RDF::Statement>] statements
109
+ # @param [Hash] hashes
110
+ # @return [Boolean, String]
111
+ # a tuple consisting of grounded being true or false and the String for the hash
112
+ def node_hash_for(node, statements, hashes)
113
+ statement_signatures = []
114
+ grounded = true
115
+ statements.each do | statement |
116
+ if statement.to_quad.include?(node)
117
+ statement_signatures << hash_string_for(statement, hashes, node)
118
+ statement.to_quad.compact.each do | resource |
119
+ grounded = false unless grounded?(resource, hashes) || resource == node
120
+ end
121
+ end
122
+ end
123
+ # Note that we sort the signatures--without a canonical ordering,
124
+ # we might get different hashes for equivalent nodes.
125
+ [grounded,Digest::SHA1.hexdigest(statement_signatures.sort.to_s)]
126
+ end
127
+
128
+ # Provide a string signature for the given statement, collecting
129
+ # string signatures for grounded node elements.
130
+ # @return [String]
131
+ def hash_string_for(statement, hashes, node)
132
+ statement.to_quad.map {|r| string_for_node(r, hashes, node)}.join("")
133
+ end
134
+
135
+ # Returns true if a given node is grounded
136
+ # A node is groundd if it is not a blank node or it is included
137
+ # in the given mapping of grounded nodes.
138
+ # @return [Boolean]
139
+ def grounded?(node, hashes)
140
+ (!(node.node?)) || (hashes.member? node)
141
+ end
142
+
143
+ # Provides a string for the given node for use in a string signature
144
+ # Non-anonymous nodes will return their string form. Grounded anonymous
145
+ # nodes will return their hashed form.
146
+ # @return [String]
147
+ def string_for_node(node, hashes, target)
148
+ case
149
+ when node.nil?
150
+ ""
151
+ when node == target
152
+ "itself"
153
+ when node.node? && hashes.member?(node)
154
+ hashes[node]
155
+ when node.node?
156
+ "a blank node"
157
+ # RDF.rb auto-boxing magic makes some literals the same when they
158
+ # should not be; the ntriples serializer will take care of us
159
+ when node.literal?
160
+ node.class.name + RDF::NTriples.serialize(node)
161
+ else
162
+ node.to_s
163
+ end
164
+ end
165
+ end
166
+ end
@@ -0,0 +1,11 @@
1
+ require 'rdf/nquads'
2
+
3
+ module RDF::Normalize
4
+ class Format < RDF::Format
5
+ content_encoding 'utf-8'
6
+
7
+ # It reads like normal N-Quads
8
+ reader { RDF::NQuads::Reader}
9
+ writer { RDF::Normalize::Writer }
10
+ end
11
+ end
@@ -0,0 +1,264 @@
1
+ module RDF::Normalize
2
+ class URDNA2015
3
+ include RDF::Enumerable
4
+ include Base
5
+ include Utils
6
+
7
+ ##
8
+ # Create an enumerable with grounded nodes
9
+ #
10
+ # @param [RDF::Enumerable] enumerable
11
+ # @return [RDF::Enumerable]
12
+ def initialize(enumerable, options)
13
+ @dataset, @options = enumerable, options
14
+ end
15
+
16
+ def each(&block)
17
+ ns = NormalizationState.new(@options)
18
+ normalize_statements(ns, &block)
19
+ end
20
+
21
+ protected
22
+ def normalize_statements(ns, &block)
23
+ # Map BNodes to the statements they are used by
24
+ dataset.each_statement do |statement|
25
+ statement.to_quad.compact.select(&:node?).each do |node|
26
+ ns.add_statement(node, statement)
27
+ end
28
+ end
29
+
30
+ non_normalized_identifiers, simple = ns.bnode_to_statements.keys, true
31
+
32
+ while simple
33
+ simple = false
34
+ ns.hash_to_bnodes = {}
35
+
36
+ # Calculate hashes for first degree nodes
37
+ non_normalized_identifiers.each do |node|
38
+ hash = depth {ns.hash_first_degree_quads(node)}
39
+ debug("1deg") {"hash: #{hash}"}
40
+ ns.add_bnode_hash(node, hash)
41
+ end
42
+
43
+ # Create canonical replacements for hashes mapping to a single node
44
+ ns.hash_to_bnodes.keys.sort.each do |hash|
45
+ identifier_list = ns.hash_to_bnodes[hash]
46
+ next if identifier_list.length > 1
47
+ node = identifier_list.first
48
+ id = ns.canonical_issuer.issue_identifier(node)
49
+ debug("single node") {"node: #{node.to_ntriples}, hash: #{hash}, id: #{id}"}
50
+ non_normalized_identifiers -= identifier_list
51
+ ns.hash_to_bnodes.delete(hash)
52
+ simple = true
53
+ end
54
+ end
55
+
56
+ # Iterate over hashs having more than one node
57
+ ns.hash_to_bnodes.keys.sort.each do |hash|
58
+ identifier_list = ns.hash_to_bnodes[hash]
59
+
60
+ debug("multiple nodes") {"node: #{identifier_list.map(&:to_ntriples).join(",")}, hash: #{hash}"}
61
+ hash_path_list = []
62
+
63
+ # Create a hash_path_list for all bnodes using a temporary identifier used to create canonical replacements
64
+ identifier_list.each do |identifier|
65
+ next if ns.canonical_issuer.issued.include?(identifier)
66
+ temporary_issuer = IdentifierIssuer.new("_:b")
67
+ temporary_issuer.issue_identifier(identifier)
68
+ hash_path_list << depth {ns.hash_n_degree_quads(identifier, temporary_issuer)}
69
+ end
70
+ debug("->") {"hash_path_list: #{hash_path_list.map(&:first).inspect}"}
71
+
72
+ # Create canonical replacements for nodes
73
+ hash_path_list.sort_by(&:first).map(&:last).each do |issuer|
74
+ issuer.issued.each do |node|
75
+ id = ns.canonical_issuer.issue_identifier(node)
76
+ debug("-->") {"node: #{node.to_ntriples}, id: #{id}"}
77
+ end
78
+ end
79
+ end
80
+
81
+ # Yield statements using BNodes from canonical replacements
82
+ dataset.each_statement do |statement|
83
+ if statement.has_blank_nodes?
84
+ quad = statement.to_quad.compact.map do |term|
85
+ term.node? ? RDF::Node.intern(ns.canonical_issuer.identifier(term)[2..-1]) : term
86
+ end
87
+ block.call RDF::Statement.from(quad)
88
+ else
89
+ block.call statement
90
+ end
91
+ end
92
+ end
93
+
94
+ private
95
+
96
+ class NormalizationState
97
+ include Utils
98
+
99
+ attr_accessor :bnode_to_statements
100
+ attr_accessor :hash_to_bnodes
101
+ attr_accessor :canonical_issuer
102
+
103
+ def initialize(options)
104
+ @options = options
105
+ @bnode_to_statements, @hash_to_bnodes, @canonical_issuer = {}, {}, IdentifierIssuer.new("_:c14n")
106
+ end
107
+
108
+ def add_statement(node, statement)
109
+ bnode_to_statements[node] ||= []
110
+ bnode_to_statements[node] << statement unless bnode_to_statements[node].include?(statement)
111
+ end
112
+
113
+ def add_bnode_hash(node, hash)
114
+ hash_to_bnodes[hash] ||= []
115
+ hash_to_bnodes[hash] << node unless hash_to_bnodes[hash].include?(node)
116
+ end
117
+
118
+ # @param [RDF::Node] node
119
+ # @return [String] the SHA1 hexdigest hash of statements using this node, with replacements
120
+ def hash_first_degree_quads(node)
121
+ quads = bnode_to_statements[node].
122
+ map do |statement|
123
+ quad = statement.to_quad.map do |t|
124
+ case t
125
+ when node then RDF::Node("a")
126
+ when RDF::Node then RDF::Node("z")
127
+ else t
128
+ end
129
+ end
130
+ RDF::NQuads::Writer.serialize(RDF::Statement.from(quad))
131
+ end
132
+
133
+ debug("1deg") {"node: #{node}, quads: #{quads}"}
134
+ hexdigest(quads.sort.join)
135
+ end
136
+
137
+ # @param [RDF::Node] related
138
+ # @param [RDF::Statement] statement
139
+ # @param [IdentifierIssuer] issuer
140
+ # @param [String] position one of :s, :o, or :g
141
+ # @return [String] the SHA1 hexdigest hash
142
+ def hash_related_node(related, statement, issuer, position)
143
+ identifier = canonical_issuer.identifier(related) ||
144
+ issuer.identifier(related) ||
145
+ hash_first_degree_quads(related)
146
+ input = position.to_s
147
+ input << statement.predicate.to_ntriples unless position == :g
148
+ input << identifier
149
+ debug("hrel") {"input: #{input.inspect}, hash: #{hexdigest(input)}"}
150
+ hexdigest(input)
151
+ end
152
+
153
+ # @param [RDF::Node] identifier
154
+ # @param [IdentifierIssuer] issuer
155
+ # @return [Array<String,IdentifierIssuer>] the Hash and issuer
156
+ def hash_n_degree_quads(identifier, issuer)
157
+ debug("ndeg") {"identifier: #{identifier.to_ntriples}"}
158
+
159
+ # hash to related blank nodes map
160
+ map = {}
161
+
162
+ bnode_to_statements[identifier].each do |statement|
163
+ hash_related_statement(identifier, statement, issuer, map)
164
+ end
165
+
166
+ data_to_hash = ""
167
+
168
+ debug("ndeg") {"map: #{map.map {|h,l| "#{h}: #{l.map(&:to_ntriples)}"}.join('; ')}"}
169
+ depth do
170
+ map.keys.sort.each do |hash|
171
+ list = map[hash]
172
+ # Iterate over related nodes
173
+ chosen_path, chosen_issuer = "", nil
174
+ data_to_hash += hash
175
+
176
+ list.permutation do |permutation|
177
+ debug("ndeg") {"perm: #{permutation.map(&:to_ntriples).join(",")}"}
178
+ issuer_copy, path, recursion_list = issuer.dup, "", []
179
+
180
+ permutation.each do |related|
181
+ if canonical_issuer.identifier(related)
182
+ path << canonical_issuer.issue_identifier(related)
183
+ else
184
+ recursion_list << related if !issuer_copy.identifier(related)
185
+ path << issuer_copy.issue_identifier(related)
186
+ end
187
+
188
+ # Skip to the next permutation if chosen path isn't empty and the path is greater than the chosen path
189
+ break if !chosen_path.empty? && path.length >= chosen_path.length
190
+ end
191
+ debug("ndeg") {"hash: #{hash}, path: #{path}, recursion: #{recursion_list.map(&:to_ntriples)}"}
192
+
193
+ recursion_list.each do |related|
194
+ result = depth {hash_n_degree_quads(related, issuer_copy)}
195
+ path << issuer_copy.issue_identifier(related)
196
+ path << "<#{result.first}>"
197
+ issuer_copy = result.last
198
+ break if !chosen_path.empty? && path.length >= chosen_path.length && path > chosen_path
199
+ end
200
+
201
+ if chosen_path.empty? || path < chosen_path
202
+ chosen_path, chosen_issuer = path, issuer_copy
203
+ end
204
+ end
205
+
206
+ data_to_hash += chosen_path
207
+ issuer = chosen_issuer
208
+ end
209
+ end
210
+
211
+ debug("ndeg") {"datatohash: #{data_to_hash.inspect}, hash: #{hexdigest(data_to_hash)}"}
212
+ return [hexdigest(data_to_hash), issuer]
213
+ end
214
+
215
+ protected
216
+
217
+ # FIXME: should be SHA-256.
218
+ def hexdigest(val)
219
+ Digest::SHA1.hexdigest(val)
220
+ end
221
+
222
+ # Group adjacent bnodes by hash
223
+ def hash_related_statement(identifier, statement, issuer, map)
224
+ statement.to_hash(:s, :p, :o, :g).each do |pos, term|
225
+ next if !term.is_a?(RDF::Node) || term == identifier
226
+
227
+ hash = depth {hash_related_node(term, statement, issuer, pos)}
228
+ map[hash] ||= []
229
+ map[hash] << term unless map[hash].include?(term)
230
+ end
231
+ end
232
+ end
233
+
234
+ class IdentifierIssuer
235
+ def initialize(prefix = "_:c14n")
236
+ @prefix, @counter, @issued = prefix, 0, {}
237
+ end
238
+
239
+ # Return an identifier for this BNode
240
+ def issue_identifier(node)
241
+ @issued[node] ||= begin
242
+ res, @counter = @prefix + @counter.to_s, @counter + 1
243
+ res
244
+ end
245
+ end
246
+
247
+ def issued
248
+ @issued.keys
249
+ end
250
+
251
+ def identifier(node)
252
+ @issued[node]
253
+ end
254
+
255
+ # Duplicate this issuer, ensuring that the issued identifiers remain distinct
256
+ # @return [IdentifierIssuer]
257
+ def dup
258
+ other = super
259
+ other.instance_variable_set(:@issued, @issued.dup)
260
+ other
261
+ end
262
+ end
263
+ end
264
+ end
@@ -0,0 +1,47 @@
1
+ module RDF::Normalize
2
+ class URGNA2012 < URDNA2015
3
+
4
+ def each(&block)
5
+ ns = NormalizationState.new(@options)
6
+ normalize_statements(ns, &block)
7
+ end
8
+
9
+ class NormalizationState < URDNA2015::NormalizationState
10
+ protected
11
+
12
+ # 2012 version uses SHA-1
13
+ def hexdigest(val)
14
+ Digest::SHA1.hexdigest(val)
15
+ end
16
+
17
+ # @param [RDF::Node] related
18
+ # @param [RDF::Statement] statement
19
+ # @param [IdentifierIssuer] issuer
20
+ # @param [String] position one of :s, :o, or :g
21
+ # @return [String] the SHA1 hexdigest hash
22
+ def hash_related_node(related, statement, issuer, position)
23
+ identifier = canonical_issuer.identifier(related) ||
24
+ issuer.identifier(related) ||
25
+ hash_first_degree_quads(related)
26
+ input = position.to_s
27
+ input << statement.predicate.to_s
28
+ input << identifier
29
+ debug("hrel") {"input: #{input.inspect}, hash: #{hexdigest(input)}"}
30
+ hexdigest(input)
31
+ end
32
+
33
+ # In URGNA2012, the position parameter passed to the Hash Related Blank Node algorithm was instead modeled as a direction parameter, where it could have the value p, for property, when the related blank node was a `subject` and the value r, for reverse or reference, when the related blank node was an `object`. Since URGNA2012 only normalized graphs, not datasets, there was no use of the `graph` position.
34
+ def hash_related_statement(identifier, statement, issuer, map)
35
+ if statement.subject.node? && statement.subject != identifier
36
+ hash = depth {hash_related_node(statement.subject, statement, issuer, :p)}
37
+ map[hash] ||= []
38
+ map[hash] << statement.subject unless map[hash].include?(statement.subject)
39
+ elsif statement.object.node? && statement.object != identifier
40
+ hash = depth {hash_related_node(statement.object, statement, issuer, :r)}
41
+ map[hash] ||= []
42
+ map[hash] << statement.object unless map[hash].include?(statement.object)
43
+ end
44
+ end
45
+ end
46
+ end
47
+ end
@@ -0,0 +1,33 @@
1
+ module RDF::Normalize
2
+ module Utils
3
+ # Add debug event to debug array, if specified
4
+ #
5
+ # param [String] message
6
+ # yieldreturn [String] appended to message, to allow for lazy-evaulation of message
7
+ def debug(*args)
8
+ options = args.last.is_a?(Hash) ? args.pop : {}
9
+ return unless options[:debug] || @options[:debug]
10
+ depth = options[:depth] || @options[:depth]
11
+ d_str = depth > 100 ? ' ' * 100 + '+' : ' ' * depth
12
+ list = args
13
+ list << yield if block_given?
14
+ message = d_str + (list.empty? ? "" : list.join(": "))
15
+ options[:debug] << message if options[:debug].is_a?(Array)
16
+ @options[:debug] << message if @options[:debug].is_a?(Array)
17
+ $stderr.puts(message) if @options[:debug] == TrueClass
18
+ end
19
+ module_function :debug
20
+
21
+ # Increase depth around a method invocation
22
+ # @yield
23
+ # Yields with no arguments
24
+ # @yieldreturn [Object] returns the result of yielding
25
+ # @return [Object]
26
+ def depth
27
+ @options[:depth] += 1
28
+ ret = yield
29
+ @options[:depth] -= 1
30
+ ret
31
+ end
32
+ end
33
+ end
@@ -0,0 +1,79 @@
1
+ module RDF::Normalize
2
+ ##
3
+ # A RDF Graph normalization serialiser.
4
+ #
5
+ # Normalizes the enumerated statements into normal form in the form of N-Quads.
6
+ #
7
+ # @author [Gregg Kellogg](http://kellogg-assoc.com/)
8
+ class Writer < RDF::NQuads::Writer
9
+ format RDF::Normalize::Format
10
+
11
+ # @attr_accessor [RDF::Repository] Repository of statements to serialized
12
+ attr_accessor :repo
13
+
14
+ ##
15
+ # Initializes the writer instance.
16
+ #
17
+ # @param [IO, File] output
18
+ # the output stream
19
+ # @param [Hash{Symbol => Object}] options
20
+ # any additional options
21
+ # @yield [writer] `self`
22
+ # @yieldparam [RDF::Writer] writer
23
+ # @yieldreturn [void]
24
+ # @yield [writer]
25
+ # @yieldparam [RDF::Writer] writer
26
+ def initialize(output = $stdout, options = {}, &block)
27
+ super do
28
+ @options[:depth] ||= 0
29
+ @repo = RDF::Repository.new
30
+ if block_given?
31
+ case block.arity
32
+ when 0 then instance_eval(&block)
33
+ else block.call(self)
34
+ end
35
+ end
36
+ end
37
+ end
38
+
39
+ ##
40
+ # Defer writing to epilogue
41
+ def write_statement(statement)
42
+ self
43
+ end
44
+
45
+ ##
46
+ # Outputs the Graph representation of all stored triples.
47
+ #
48
+ # @return [void]
49
+ def write_epilogue
50
+ statements = RDF::Normalize.new(@repo, @options).
51
+ statements.
52
+ reject(&:variable?).
53
+ map {|s| format_statement(s)}.
54
+ sort.
55
+ each do |line|
56
+ puts line
57
+ end
58
+ end
59
+
60
+ protected
61
+
62
+ ##
63
+ # Adds a statement to be serialized
64
+ # @param [RDF::Statement] statement
65
+ # @return [void]
66
+ def insert_statement(statement)
67
+ @repo.insert(statement)
68
+ end
69
+
70
+ ##
71
+ # Insert an Enumerable
72
+ #
73
+ # @param [RDF::Enumerable] graph
74
+ # @return [void]
75
+ def insert_statements(enumerable)
76
+ @repo = enumerable
77
+ end
78
+ end
79
+ end
metadata ADDED
@@ -0,0 +1,160 @@
1
+ --- !ruby/object:Gem::Specification
2
+ name: rdf-normalize
3
+ version: !ruby/object:Gem::Version
4
+ version: 0.1.0
5
+ platform: ruby
6
+ authors:
7
+ - Gregg Kellogg
8
+ autorequire:
9
+ bindir: bin
10
+ cert_chain: []
11
+ date: 2015-05-20 00:00:00.000000000 Z
12
+ dependencies:
13
+ - !ruby/object:Gem::Dependency
14
+ name: rdf
15
+ requirement: !ruby/object:Gem::Requirement
16
+ requirements:
17
+ - - "~>"
18
+ - !ruby/object:Gem::Version
19
+ version: '1.1'
20
+ type: :runtime
21
+ prerelease: false
22
+ version_requirements: !ruby/object:Gem::Requirement
23
+ requirements:
24
+ - - "~>"
25
+ - !ruby/object:Gem::Version
26
+ version: '1.1'
27
+ - !ruby/object:Gem::Dependency
28
+ name: rdf-spec
29
+ requirement: !ruby/object:Gem::Requirement
30
+ requirements:
31
+ - - "~>"
32
+ - !ruby/object:Gem::Version
33
+ version: '1.1'
34
+ type: :development
35
+ prerelease: false
36
+ version_requirements: !ruby/object:Gem::Requirement
37
+ requirements:
38
+ - - "~>"
39
+ - !ruby/object:Gem::Version
40
+ version: '1.1'
41
+ - !ruby/object:Gem::Dependency
42
+ name: open-uri-cached
43
+ requirement: !ruby/object:Gem::Requirement
44
+ requirements:
45
+ - - "~>"
46
+ - !ruby/object:Gem::Version
47
+ version: '0.0'
48
+ - - ">="
49
+ - !ruby/object:Gem::Version
50
+ version: 0.0.5
51
+ type: :development
52
+ prerelease: false
53
+ version_requirements: !ruby/object:Gem::Requirement
54
+ requirements:
55
+ - - "~>"
56
+ - !ruby/object:Gem::Version
57
+ version: '0.0'
58
+ - - ">="
59
+ - !ruby/object:Gem::Version
60
+ version: 0.0.5
61
+ - !ruby/object:Gem::Dependency
62
+ name: rspec
63
+ requirement: !ruby/object:Gem::Requirement
64
+ requirements:
65
+ - - "~>"
66
+ - !ruby/object:Gem::Version
67
+ version: '3.2'
68
+ type: :development
69
+ prerelease: false
70
+ version_requirements: !ruby/object:Gem::Requirement
71
+ requirements:
72
+ - - "~>"
73
+ - !ruby/object:Gem::Version
74
+ version: '3.2'
75
+ - !ruby/object:Gem::Dependency
76
+ name: webmock
77
+ requirement: !ruby/object:Gem::Requirement
78
+ requirements:
79
+ - - "~>"
80
+ - !ruby/object:Gem::Version
81
+ version: '1.17'
82
+ type: :development
83
+ prerelease: false
84
+ version_requirements: !ruby/object:Gem::Requirement
85
+ requirements:
86
+ - - "~>"
87
+ - !ruby/object:Gem::Version
88
+ version: '1.17'
89
+ - !ruby/object:Gem::Dependency
90
+ name: json-ld
91
+ requirement: !ruby/object:Gem::Requirement
92
+ requirements:
93
+ - - "~>"
94
+ - !ruby/object:Gem::Version
95
+ version: '1.1'
96
+ type: :development
97
+ prerelease: false
98
+ version_requirements: !ruby/object:Gem::Requirement
99
+ requirements:
100
+ - - "~>"
101
+ - !ruby/object:Gem::Version
102
+ version: '1.1'
103
+ - !ruby/object:Gem::Dependency
104
+ name: yard
105
+ requirement: !ruby/object:Gem::Requirement
106
+ requirements:
107
+ - - "~>"
108
+ - !ruby/object:Gem::Version
109
+ version: '0.8'
110
+ type: :development
111
+ prerelease: false
112
+ version_requirements: !ruby/object:Gem::Requirement
113
+ requirements:
114
+ - - "~>"
115
+ - !ruby/object:Gem::Version
116
+ version: '0.8'
117
+ description: RDF::Normalize is a Graph normalizer for the RDF.rb library suite.
118
+ email: public-rdf-ruby@w3.org
119
+ executables: []
120
+ extensions: []
121
+ extra_rdoc_files: []
122
+ files:
123
+ - AUTHORS
124
+ - LICENSE
125
+ - README.md
126
+ - VERSION
127
+ - lib/rdf/normalize.rb
128
+ - lib/rdf/normalize/base.rb
129
+ - lib/rdf/normalize/carroll2001.rb
130
+ - lib/rdf/normalize/format.rb
131
+ - lib/rdf/normalize/urdna2015.rb
132
+ - lib/rdf/normalize/urgna2012.rb
133
+ - lib/rdf/normalize/utils.rb
134
+ - lib/rdf/normalize/writer.rb
135
+ homepage: http://github.com/gkellogg/rdf-normalize
136
+ licenses:
137
+ - Public Domain
138
+ metadata: {}
139
+ post_install_message:
140
+ rdoc_options: []
141
+ require_paths:
142
+ - lib
143
+ required_ruby_version: !ruby/object:Gem::Requirement
144
+ requirements:
145
+ - - ">="
146
+ - !ruby/object:Gem::Version
147
+ version: 1.9.2
148
+ required_rubygems_version: !ruby/object:Gem::Requirement
149
+ requirements:
150
+ - - ">="
151
+ - !ruby/object:Gem::Version
152
+ version: '0'
153
+ requirements: []
154
+ rubyforge_project: rdf-normalize
155
+ rubygems_version: 2.4.7
156
+ signing_key:
157
+ specification_version: 4
158
+ summary: RDF Graph normalizer for Ruby.
159
+ test_files: []
160
+ has_rdoc: false