rdf-normalize 0.5.0 → 0.6.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: 60932e4daabb349e76fb838c46aeea30ea39ff2406d7fd2f08b4df20b22975e4
4
- data.tar.gz: d1f1b87ac1a46ad18c7a1e848c8e7306b20cedbdb374cc1a197f8a2235875b27
3
+ metadata.gz: 15478756de443574bde6120436faf09bec1f7e40dcfc60f39fc97af92e686738
4
+ data.tar.gz: d5617da52a4d7e3429452f691e4a9ccb7f6ac8bedcef6dd66583b1322e0b57f0
5
5
  SHA512:
6
- metadata.gz: 263602b7c6861bf74745600e5828f5095cb9a3a3f08974621d050982a78bc6d4d6e1c6e421cc69e5bc9fa0146cfd0521b0f4817989b8644b4d25fa02c13015dd
7
- data.tar.gz: 98a1391dd232a2db5ea8353a51268fcfe9def16dc7a8f309bb305c7feeb9df85f8e96c8cc5dbcf4b3486e34d0133b33bdb353d2aa567bb948d412bb36be5e202
6
+ metadata.gz: 7c2ccd4449f12d5095702d19a8c1d27539aa5afa23c8b96ffcf6f43ee0d6d10fd763e2dbc98f2ef008ede3edc3fda1801eb6a1cd3ad0e80e3b82995017ae93e4
7
+ data.tar.gz: f760c7336703292679c82b6abbea86ffe7b8ac1b803508c187d8aee7bcd8cd635d0b039d928b7d145198f7df884027aeb911fa2e97e0e9d171cae92e4d26ed0b
data/README.md CHANGED
@@ -1,13 +1,13 @@
1
1
  # RDF::Normalize
2
2
  RDF Graph normalizer for [RDF.rb][RDF.rb].
3
3
 
4
- [![Gem Version](https://badge.fury.io/rb/rdf-normalize.png)](https://badge.fury.io/rb/rdf-normalize)
4
+ [![Gem Version](https://badge.fury.io/rb/rdf-normalize.svg)](https://badge.fury.io/rb/rdf-normalize)
5
5
  [![Build Status](https://github.com/ruby-rdf/rdf-normalize/workflows/CI/badge.svg?branch=develop)](https://github.com/ruby-rdf/rdf-normalize/actions?query=workflow%3ACI)
6
6
  [![Coverage Status](https://coveralls.io/repos/ruby-rdf/rdf-normalize/badge.svg?branch=develop)](https://coveralls.io/github/ruby-rdf/rdf-normalize?branch=develop)
7
7
  [![Gitter chat](https://badges.gitter.im/ruby-rdf/rdf.png)](https://gitter.im/ruby-rdf/rdf)
8
8
 
9
9
  ## Description
10
- This is a [Ruby][] implementation of a [RDF Normalize][] for [RDF.rb][].
10
+ This is a [Ruby][] implementation of a [RDF Dataset Canonicalization][] for [RDF.rb][].
11
11
 
12
12
  ## Features
13
13
  RDF::Normalize generates normalized [N-Quads][] output for an RDF Dataset using the algorithm
@@ -16,8 +16,8 @@ to serialize normalized statements.
16
16
 
17
17
  Algorithms implemented:
18
18
 
19
- * [URGNA2012](https://json-ld.github.io/normalization/spec/index.html#dfn-urgna2012)
20
- * [URDNA2015](https://json-ld.github.io/normalization/spec/index.html#dfn-urdna2015)
19
+ * [URGNA2012](https://www.w3.org/TR/rdf-canon/#dfn-urgna2012)
20
+ * [RDFC-1.0](https://www.w3.org/TR/rdf-canon/#dfn-rdfc-1-0)
21
21
 
22
22
  Install with `gem install rdf-normalize`
23
23
 
@@ -27,7 +27,17 @@ Install with `gem install rdf-normalize`
27
27
  ## Usage
28
28
 
29
29
  ## Documentation
30
- Full documentation available on [Rubydoc.info][Normalize doc]
30
+
31
+ Full documentation available on [GitHub][Normalize doc]
32
+
33
+ ## Examples
34
+
35
+ ### Returning normalized N-Quads
36
+
37
+ require 'rdf/normalize'
38
+ require 'rdf/turtle'
39
+ g = RDF::Graph.load("etc/doap.ttl")
40
+ puts g.dump(:normalize)
31
41
 
32
42
  ### Principle Classes
33
43
  * {RDF::Normalize}
@@ -35,8 +45,7 @@ Full documentation available on [Rubydoc.info][Normalize doc]
35
45
  * {RDF::Normalize::Format}
36
46
  * {RDF::Normalize::Writer}
37
47
  * {RDF::Normalize::URGNA2012}
38
- * {RDF::Normalize::URDNA2015}
39
-
48
+ * {RDF::Normalize::RDFC10}
40
49
 
41
50
  ## Dependencies
42
51
 
@@ -80,7 +89,7 @@ see <https://unlicense.org/> or the accompanying {file:LICENSE} file.
80
89
  [YARD]: https://yardoc.org/
81
90
  [YARD-GS]: https://rubydoc.info/docs/yard/file/docs/GettingStarted.md
82
91
  [PDD]: https://unlicense.org/#unlicensing-contributions
83
- [RDF.rb]: https://rubydoc.info/github/ruby-rdf/rdf-normalize
92
+ [RDF.rb]: https://ruby-rdf.github.io/rdf-normalize
84
93
  [N-Triples]: https://www.w3.org/TR/rdf-testcases/#ntriples
85
- [RDF Normalize]:https://json-ld.github.io/normalization/spec/
86
- [Normalize doc]:https://rubydoc.info/github/ruby-rdf/rdf-normalize/master
94
+ [RDF Dataset Canonicalization]: https://www.w3.org/TR/rdf-canon/
95
+ [Normalize doc]: https://ruby-rdf.github.io/rdf-normalize/
data/VERSION CHANGED
@@ -1 +1 @@
1
- 0.5.0
1
+ 0.6.0
@@ -0,0 +1,390 @@
1
+ require 'rdf/nquads'
2
+ begin
3
+ require 'json'
4
+ rescue LoadError
5
+ # Used for debug output
6
+ end
7
+
8
+ module RDF::Normalize
9
+ class RDFC10
10
+ include RDF::Enumerable
11
+ include RDF::Util::Logger
12
+ include Base
13
+
14
+ ##
15
+ # Create an enumerable with grounded nodes
16
+ #
17
+ # @param [RDF::Enumerable] enumerable
18
+ # @return [RDF::Enumerable]
19
+ def initialize(enumerable, **options)
20
+ @dataset, @options = enumerable, options
21
+ end
22
+
23
+ def each(&block)
24
+ ns = NormalizationState.new(@options)
25
+ log_debug("ca:")
26
+ log_debug(" log point", "Entering the canonicalization function (4.5.3).")
27
+ log_depth(depth: 2) {normalize_statements(ns, &block)}
28
+ end
29
+
30
+ protected
31
+ def normalize_statements(ns, &block)
32
+ # Step 2: Map BNodes to the statements they are used by
33
+ dataset.each_statement do |statement|
34
+ statement.to_quad.compact.select(&:node?).each do |node|
35
+ ns.add_statement(node, statement)
36
+ end
37
+ end
38
+ log_debug("ca.2:")
39
+ log_debug(" log point", "Extract quads for each bnode (4.5.3 (2)).")
40
+ log_debug(" Bnode to quads:")
41
+ if logger && logger.level == 0
42
+ ns.bnode_to_statements.each do |bn, statements|
43
+ log_debug(" #{bn.id}:")
44
+ statements.each do |s|
45
+ log_debug {" - #{s.to_nquads.strip}"}
46
+ end
47
+ end
48
+ end
49
+
50
+ ns.hash_to_bnodes = {}
51
+
52
+ # Step 3: Calculate hashes for first degree nodes
53
+ log_debug("ca.3:")
54
+ log_debug(" log point", "Calculated first degree hashes (4.5.3 (3)).")
55
+ log_debug(" with:")
56
+ ns.bnode_to_statements.each_key do |node|
57
+ log_debug(" - identifier") {node.id}
58
+ log_debug(" h1dq:")
59
+ hash = log_depth(depth: 8) {ns.hash_first_degree_quads(node)}
60
+ ns.add_bnode_hash(node, hash)
61
+ end
62
+
63
+ # Step 4: Create canonical replacements for hashes mapping to a single node
64
+ log_debug("ca.4:")
65
+ log_debug(" log point", "Create canonical replacements for hashes mapping to a single node (4.5.3 (4)).")
66
+ log_debug(" with:") unless ns.hash_to_bnodes.empty?
67
+ ns.hash_to_bnodes.keys.sort.each do |hash|
68
+ identifier_list = ns.hash_to_bnodes[hash]
69
+ next if identifier_list.length > 1
70
+ node = identifier_list.first
71
+ id = ns.canonical_issuer.issue_identifier(node)
72
+ log_debug(" - identifier") {node.id}
73
+ log_debug(" hash", hash)
74
+ log_debug(" canonical label", id)
75
+ ns.hash_to_bnodes.delete(hash)
76
+ end
77
+
78
+ # Step 5: Iterate over hashs having more than one node
79
+ log_debug("ca.5:") unless ns.hash_to_bnodes.empty?
80
+ log_debug(" log point", "Calculate hashes for identifiers with shared hashes (4.5.3 (5)).")
81
+ log_debug(" with:") unless ns.hash_to_bnodes.empty?
82
+ ns.hash_to_bnodes.keys.sort.each do |hash|
83
+ identifier_list = ns.hash_to_bnodes[hash]
84
+
85
+ log_debug(" - hash", hash)
86
+ log_debug(" identifier list") {identifier_list.map(&:id).to_json(indent: ' ')}
87
+ hash_path_list = []
88
+
89
+ # Create a hash_path_list for all bnodes using a temporary identifier used to create canonical replacements
90
+ log_debug(" ca.5.2:")
91
+ log_debug(" log point", "Calculate hashes for identifiers with shared hashes (4.5.3 (5.2)).")
92
+ log_debug(" with:") unless identifier_list.empty?
93
+ identifier_list.each do |identifier|
94
+ next if ns.canonical_issuer.issued.include?(identifier)
95
+ temporary_issuer = IdentifierIssuer.new("b")
96
+ temporary_issuer.issue_identifier(identifier)
97
+ log_debug(" - identifier") {identifier.id}
98
+ hash_path_list << log_depth(depth: 12) {ns.hash_n_degree_quads(identifier, temporary_issuer)}
99
+ end
100
+
101
+ # Create canonical replacements for nodes
102
+ log_debug(" ca.5.3:") unless hash_path_list.empty?
103
+ log_debug(" log point", "Canonical identifiers for temporary identifiers (4.5.3 (5.3)).")
104
+ log_debug(" issuer:") unless hash_path_list.empty?
105
+ hash_path_list.sort_by(&:first).each do |result, issuer|
106
+ issuer.issued.each do |node|
107
+ id = ns.canonical_issuer.issue_identifier(node)
108
+ log_debug(" - blank node") {node.id}
109
+ log_debug(" canonical identifier", id)
110
+ end
111
+ end
112
+ end
113
+
114
+ # Step 6: Yield statements using BNodes from canonical replacements
115
+ dataset.each_statement do |statement|
116
+ if statement.has_blank_nodes?
117
+ quad = statement.to_quad.compact.map do |term|
118
+ term.node? ? RDF::Node.intern(ns.canonical_issuer.identifier(term)) : term
119
+ end
120
+ block.call RDF::Statement.from(quad)
121
+ else
122
+ block.call statement
123
+ end
124
+ end
125
+
126
+ log_debug("ca.6:")
127
+ log_debug(" log point", "Replace original with canonical labels (4.5.3 (6)).")
128
+ log_debug(" canonical issuer: #{ns.canonical_issuer.inspect}")
129
+ dataset
130
+ end
131
+
132
+ private
133
+
134
+ class NormalizationState
135
+ include RDF::Util::Logger
136
+
137
+ attr_accessor :bnode_to_statements
138
+ attr_accessor :hash_to_bnodes
139
+ attr_accessor :canonical_issuer
140
+
141
+ def initialize(options)
142
+ @options = options
143
+ @bnode_to_statements, @hash_to_bnodes, @canonical_issuer = {}, {}, IdentifierIssuer.new("c14n")
144
+ end
145
+
146
+ def add_statement(node, statement)
147
+ bnode_to_statements[node] ||= []
148
+ bnode_to_statements[node] << statement unless bnode_to_statements[node].any? {|st| st.eql?(statement)}
149
+ end
150
+
151
+ def add_bnode_hash(node, hash)
152
+ hash_to_bnodes[hash] ||= []
153
+ # Match on object IDs of nodes, rather than simple node equality
154
+ hash_to_bnodes[hash] << node unless hash_to_bnodes[hash].any? {|n| n.eql?(node)}
155
+ end
156
+
157
+ # This algorithm calculates a hash for a given blank node across the quads in a dataset in which that blank node is a component. If the hash uniquely identifies that blank node, no further examination is necessary. Otherwise, a hash will be created for the blank node using the algorithm in [4.9 Hash N-Degree Quads](https://w3c.github.io/rdf-canon/spec/#hash-nd-quads) invoked via [4.5 Canonicalization Algorithm](https://w3c.github.io/rdf-canon/spec/#canon-algorithm).
158
+ #
159
+ # @param [RDF::Node] node The reference blank node identifier
160
+ # @return [String] the SHA256 hexdigest hash of statements using this node, with replacements
161
+ def hash_first_degree_quads(node)
162
+ nquads = bnode_to_statements[node].
163
+ map do |statement|
164
+ quad = statement.to_quad.map do |t|
165
+ case t
166
+ when node then RDF::Node("a")
167
+ when RDF::Node then RDF::Node("z")
168
+ else t
169
+ end
170
+ end
171
+ RDF::Statement.from(quad).to_nquads
172
+ end
173
+ log_debug("log point", "Hash First Degree Quads function (4.7.3).")
174
+ log_debug("nquads:")
175
+ nquads.each do |q|
176
+ log_debug {" - #{q.strip}"}
177
+ end
178
+
179
+ result = hexdigest(nquads.sort.join)
180
+ log_debug("hash") {result}
181
+ result
182
+ end
183
+
184
+ # @param [RDF::Node] related
185
+ # @param [RDF::Statement] statement
186
+ # @param [IdentifierIssuer] issuer
187
+ # @param [String] position one of :s, :o, or :g
188
+ # @return [String] the SHA256 hexdigest hash
189
+ def hash_related_node(related, statement, issuer, position)
190
+ log_debug("related") {related.id}
191
+ input = "#{position}"
192
+ input << statement.predicate.to_ntriples unless position == :g
193
+ if identifier = (canonical_issuer.identifier(related) ||
194
+ issuer.identifier(related))
195
+ input << "_:#{identifier}"
196
+ else
197
+ log_debug("h1dq:")
198
+ input << log_depth(depth: 2) do
199
+ hash_first_degree_quads(related)
200
+ end
201
+ end
202
+ log_debug("input") {input.inspect}
203
+ log_debug("hash") {hexdigest(input)}
204
+ hexdigest(input)
205
+ end
206
+
207
+ # @param [RDF::Node] identifier
208
+ # @param [IdentifierIssuer] issuer
209
+ # @return [Array<String,IdentifierIssuer>] the Hash and issuer
210
+ def hash_n_degree_quads(identifier, issuer)
211
+ log_debug("hndq:")
212
+ log_debug(" log point", "Hash N-Degree Quads function (4.9.3).")
213
+ log_debug(" identifier") {identifier.id}
214
+ log_debug(" issuer") {issuer.inspect}
215
+
216
+ # hash to related blank nodes map
217
+ hn = {}
218
+
219
+ log_debug(" hndq.2:")
220
+ log_debug(" log point", "Quads for identifier (4.9.3 (2)).")
221
+ log_debug(" quads:")
222
+ bnode_to_statements[identifier].each do |s|
223
+ log_debug {" - #{s.to_nquads.strip}"}
224
+ end
225
+
226
+ # Step 3
227
+ log_debug(" hndq.3:")
228
+ log_debug(" log point", "Hash N-Degree Quads function (4.9.3 (3)).")
229
+ log_debug(" with:") unless bnode_to_statements[identifier].empty?
230
+ bnode_to_statements[identifier].each do |statement|
231
+ log_debug {" - quad: #{statement.to_nquads.strip}"}
232
+ log_debug(" hndq.3.1:")
233
+ log_debug(" log point", "Hash related bnode component (4.9.3 (3.1))")
234
+ log_depth(depth: 10) {hash_related_statement(identifier, statement, issuer, hn)}
235
+ end
236
+ log_debug(" Hash to bnodes:")
237
+ hn.each do |k,v|
238
+ log_debug(" #{k}:")
239
+ v.each do |vv|
240
+ log_debug(" - #{vv.id}")
241
+ end
242
+ end
243
+
244
+ data_to_hash = ""
245
+
246
+ # Step 5
247
+ log_debug(" hndq.5:")
248
+ log_debug(" log point", "Hash N-Degree Quads function (4.9.3 (5)), entering loop.")
249
+ log_debug(" with:")
250
+ hn.keys.sort.each do |hash|
251
+ log_debug(" - related hash", hash)
252
+ log_debug(" data to hash") {data_to_hash.to_json}
253
+ list = hn[hash]
254
+ # Iterate over related nodes
255
+ chosen_path, chosen_issuer = "", nil
256
+ data_to_hash += hash
257
+
258
+ log_debug(" hndq.5.4:")
259
+ log_debug(" log point", "Hash N-Degree Quads function (4.9.3 (5.4)), entering loop.")
260
+ log_debug(" with:") unless list.empty?
261
+ list.permutation do |permutation|
262
+ log_debug(" - perm") {permutation.map(&:id).to_json(indent: ' ', space: ' ')}
263
+ issuer_copy, path, recursion_list = issuer.dup, "", []
264
+
265
+ log_debug(" hndq.5.4.4:")
266
+ log_debug(" log point", "Hash N-Degree Quads function (4.9.3 (5.4.4)), entering loop.")
267
+ log_debug(" with:")
268
+ permutation.each do |related|
269
+ log_debug(" - related") {related.id}
270
+ log_debug(" path") {path.to_json}
271
+ if canonical_issuer.identifier(related)
272
+ path << '_:' + canonical_issuer.issue_identifier(related)
273
+ else
274
+ recursion_list << related if !issuer_copy.identifier(related)
275
+ path << '_:' + issuer_copy.issue_identifier(related)
276
+ end
277
+
278
+ # Skip to the next permutation if chosen path isn't empty and the path is greater than the chosen path
279
+ break if !chosen_path.empty? && path.length >= chosen_path.length
280
+ end
281
+
282
+ log_debug(" hndq.5.4.5:")
283
+ log_debug(" log point", "Hash N-Degree Quads function (4.9.3 (5.4.5)), before possible recursion.")
284
+ log_debug(" recursion list") {recursion_list.map(&:id).to_json(indent: ' ')}
285
+ log_debug(" path") {path.to_json}
286
+ log_debug(" with:") unless recursion_list.empty?
287
+ recursion_list.each do |related|
288
+ log_debug(" - related") {related.id}
289
+ result = log_depth(depth: 18) {hash_n_degree_quads(related, issuer_copy)}
290
+ path << '_:' + issuer_copy.issue_identifier(related)
291
+ path << "<#{result.first}>"
292
+ issuer_copy = result.last
293
+ log_debug(" hndq.5.4.5.4:")
294
+ log_debug(" log point", "Hash N-Degree Quads function (4.9.3 (5.4.5.4)), combine result of recursion.")
295
+ log_debug(" path") {path.to_json}
296
+ log_debug(" issuer copy") {issuer_copy.inspect}
297
+ break if !chosen_path.empty? && path.length >= chosen_path.length && path > chosen_path
298
+ end
299
+
300
+ if chosen_path.empty? || path < chosen_path
301
+ chosen_path, chosen_issuer = path, issuer_copy
302
+ end
303
+ end
304
+
305
+ data_to_hash += chosen_path
306
+ log_debug(" hndq.5.5:")
307
+ log_debug(" log point", "Hash N-Degree Quads function (4.9.3 (5.5). End of current loop with Hn hashes.")
308
+ log_debug(" chosen path") {chosen_path.to_json}
309
+ log_debug(" data to hash") {data_to_hash.to_json}
310
+ issuer = chosen_issuer
311
+ end
312
+
313
+ log_debug(" hndq.6:")
314
+ log_debug(" log point", "Leaving Hash N-Degree Quads function (4.9.3).")
315
+ log_debug(" hash") {hexdigest(data_to_hash)}
316
+ log_depth(depth: 4) {log_debug("issuer") {issuer.inspect}}
317
+ return [hexdigest(data_to_hash), issuer]
318
+ end
319
+
320
+ def inspect
321
+ "NormalizationState:\nbnode_to_statements: #{inspect_bnode_to_statements}\nhash_to_bnodes: #{inspect_hash_to_bnodes}\ncanonical_issuer: #{canonical_issuer.inspect}"
322
+ end
323
+
324
+ def inspect_bnode_to_statements
325
+ bnode_to_statements.map do |n, statements|
326
+ "#{n.id}: #{statements.map {|s| s.to_nquads.strip}}"
327
+ end.join(", ")
328
+ end
329
+
330
+ def inspect_hash_to_bnodes
331
+ end
332
+
333
+ protected
334
+
335
+ def hexdigest(val)
336
+ Digest::SHA256.hexdigest(val)
337
+ end
338
+
339
+ # Group adjacent bnodes by hash
340
+ def hash_related_statement(identifier, statement, issuer, map)
341
+ log_debug("with:") if statement.to_h.values.any? {|t| t.is_a?(RDF::Node)}
342
+ statement.to_h(:s, :p, :o, :g).each do |pos, term|
343
+ next if !term.is_a?(RDF::Node) || term == identifier
344
+
345
+ log_debug(" - position", pos)
346
+ hash = log_depth(depth: 4) {hash_related_node(term, statement, issuer, pos)}
347
+ map[hash] ||= []
348
+ map[hash] << term unless map[hash].any? {|n| n.eql?(term)}
349
+ end
350
+ end
351
+ end
352
+
353
+ class IdentifierIssuer
354
+ def initialize(prefix = "c14n")
355
+ @prefix, @counter, @issued = prefix, 0, {}
356
+ end
357
+
358
+ # Return an identifier for this BNode
359
+ # @param [RDF::Node] node
360
+ # @return [String] Canonical identifier for node
361
+ def issue_identifier(node)
362
+ @issued[node] ||= begin
363
+ res, @counter = @prefix + @counter.to_s, @counter + 1
364
+ res
365
+ end
366
+ end
367
+
368
+ def issued
369
+ @issued.keys
370
+ end
371
+
372
+ # @return [RDF::Node] Canonical identifier assigned to node
373
+ def identifier(node)
374
+ @issued[node]
375
+ end
376
+
377
+ # Duplicate this issuer, ensuring that the issued identifiers remain distinct
378
+ # @return [IdentifierIssuer]
379
+ def dup
380
+ other = super
381
+ other.instance_variable_set(:@issued, @issued.dup)
382
+ other
383
+ end
384
+
385
+ def inspect
386
+ "{#{@issued.map {|k,v| "#{k.id}: #{v}"}.join(', ')}}"
387
+ end
388
+ end
389
+ end
390
+ end
@@ -1,12 +1,12 @@
1
1
  module RDF::Normalize
2
- class URGNA2012 < URDNA2015
2
+ class URGNA2012 < RDFC10
3
3
 
4
4
  def each(&block)
5
5
  ns = NormalizationState.new(@options)
6
6
  normalize_statements(ns, &block)
7
7
  end
8
8
 
9
- class NormalizationState < URDNA2015::NormalizationState
9
+ class NormalizationState < RDFC10::NormalizationState
10
10
  protected
11
11
 
12
12
  # 2012 version uses SHA-1
@@ -23,9 +23,7 @@ module RDF::Normalize
23
23
  identifier = canonical_issuer.identifier(related) ||
24
24
  issuer.identifier(related) ||
25
25
  hash_first_degree_quads(related)
26
- input = position.to_s
27
- input << statement.predicate.to_s
28
- input << identifier
26
+ input = "#{position}#{statement.predicate}#{identifier}"
29
27
  log_debug("hrel") {"input: #{input.inspect}, hash: #{hexdigest(input)}"}
30
28
  hexdigest(input)
31
29
  end
@@ -35,11 +33,11 @@ module RDF::Normalize
35
33
  if statement.subject.node? && statement.subject != identifier
36
34
  hash = log_depth {hash_related_node(statement.subject, statement, issuer, :p)}
37
35
  map[hash] ||= []
38
- map[hash] << statement.subject unless map[hash].include?(statement.subject)
36
+ map[hash] << statement.subject unless map[hash].any? {|n| n.eql?(statement.subject)}
39
37
  elsif statement.object.node? && statement.object != identifier
40
38
  hash = log_depth {hash_related_node(statement.object, statement, issuer, :r)}
41
39
  map[hash] ||= []
42
- map[hash] << statement.object unless map[hash].include?(statement.object)
40
+ map[hash] << statement.object unless map[hash].any? {|n| n.eql?(statement.object)}
43
41
  end
44
42
  end
45
43
  end
@@ -1,5 +1,5 @@
1
1
  module RDF::Normalize::VERSION
2
- VERSION_FILE = File.join(File.expand_path(File.dirname(__FILE__)), "..", "..", "..", "VERSION")
2
+ VERSION_FILE = File.expand_path("../../../../VERSION", __FILE__)
3
3
  MAJOR, MINOR, TINY, EXTRA = File.read(VERSION_FILE).chop.split(".")
4
4
 
5
5
  STRING = [MAJOR, MINOR, TINY, EXTRA].compact.join('.')
@@ -4,7 +4,7 @@ module RDF::Normalize
4
4
  #
5
5
  # Normalizes the enumerated statements into normal form in the form of N-Quads.
6
6
  #
7
- # @author [Gregg Kellogg](http://greggkellogg.net/)
7
+ # @author [Gregg Kellogg](https://greggkellogg.net/)
8
8
  class Writer < RDF::NQuads::Writer
9
9
  format RDF::Normalize::Format
10
10
 
@@ -53,7 +53,7 @@ module RDF::Normalize
53
53
  #
54
54
  # @return [void]
55
55
  def write_epilogue
56
- statements = RDF::Normalize.new(@repo, **@options).
56
+ RDF::Normalize.new(@repo, **@options).
57
57
  statements.
58
58
  reject(&:variable?).
59
59
  map {|s| format_statement(s)}.
data/lib/rdf/normalize.rb CHANGED
@@ -1,4 +1,5 @@
1
1
  require 'rdf'
2
+ require 'digest'
2
3
 
3
4
  module RDF
4
5
  ##
@@ -25,13 +26,13 @@ module RDF
25
26
  # writer << RDF::Repository.load("etc/doap.ttl")
26
27
  # end
27
28
  #
28
- # @author [Gregg Kellogg](http://greggkellogg.net/)
29
+ # @author [Gregg Kellogg](https://greggkellogg.net/)
29
30
  module Normalize
30
31
  require 'rdf/normalize/format'
31
32
  autoload :Base, 'rdf/normalize/base'
32
33
  autoload :Carroll2001,'rdf/normalize/carroll2001'
33
34
  autoload :URGNA2012, 'rdf/normalize/urgna2012'
34
- autoload :URDNA2015, 'rdf/normalize/urdna2015'
35
+ autoload :RDFC10, 'rdf/normalize/rdfc10'
35
36
  autoload :VERSION, 'rdf/normalize/version'
36
37
  autoload :Writer, 'rdf/normalize/writer'
37
38
 
@@ -42,19 +43,19 @@ module RDF
42
43
  ALGORITHMS = {
43
44
  carroll2001: :Carroll2001,
44
45
  urgna2012: :URGNA2012,
45
- urdna2015: :URDNA2015
46
+ rdfc10: :RDFC10
46
47
  }.freeze
47
48
 
48
49
  ##
49
50
  # Creates a new normalizer instance using either the specified or default normalizer algorithm
50
51
  # @param [RDF::Enumerable] enumerable
51
52
  # @param [Hash{Symbol => Object}] options
52
- # @option options [Base] :algorithm (:urdna2015)
53
- # One of `:carroll2001`, `:urgna2012`, or `:urdna2015`
53
+ # @option options [Base] :algorithm (:rdfc10)
54
+ # One of `:carroll2001`, `:urgna2012`, or `:rdfc10`
54
55
  # @return [RDF::Normalize::Base]
55
56
  # @raise [ArgumentError] selected algorithm not defined
56
57
  def new(enumerable, **options)
57
- algorithm = options.fetch(:algorithm, :urdna2015)
58
+ algorithm = options.fetch(:algorithm, :rdfc10)
58
59
  raise ArgumentError, "No algoritm defined for #{algorithm.to_sym}" unless ALGORITHMS.has_key?(algorithm)
59
60
  algorithm_class = const_get(ALGORITHMS[algorithm])
60
61
  algorithm_class.new(enumerable, **options)
metadata CHANGED
@@ -1,14 +1,14 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: rdf-normalize
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.5.0
4
+ version: 0.6.0
5
5
  platform: ruby
6
6
  authors:
7
7
  - Gregg Kellogg
8
8
  autorequire:
9
9
  bindir: bin
10
10
  cert_chain: []
11
- date: 2021-12-07 00:00:00.000000000 Z
11
+ date: 2023-06-10 00:00:00.000000000 Z
12
12
  dependencies:
13
13
  - !ruby/object:Gem::Dependency
14
14
  name: rdf
@@ -53,21 +53,21 @@ dependencies:
53
53
  - !ruby/object:Gem::Version
54
54
  version: '3.10'
55
55
  - !ruby/object:Gem::Dependency
56
- name: webmock
56
+ name: json-ld
57
57
  requirement: !ruby/object:Gem::Requirement
58
58
  requirements:
59
59
  - - "~>"
60
60
  - !ruby/object:Gem::Version
61
- version: '3.11'
61
+ version: '3.2'
62
62
  type: :development
63
63
  prerelease: false
64
64
  version_requirements: !ruby/object:Gem::Requirement
65
65
  requirements:
66
66
  - - "~>"
67
67
  - !ruby/object:Gem::Version
68
- version: '3.11'
68
+ version: '3.2'
69
69
  - !ruby/object:Gem::Dependency
70
- name: json-ld
70
+ name: rdf-trig
71
71
  requirement: !ruby/object:Gem::Requirement
72
72
  requirements:
73
73
  - - "~>"
@@ -94,7 +94,7 @@ dependencies:
94
94
  - - "~>"
95
95
  - !ruby/object:Gem::Version
96
96
  version: '0.9'
97
- description: RDF::Normalize is a Graph normalizer for the RDF.rb library suite.
97
+ description: RDF::Normalize performs Dataset Canonicalization for RDF.rb.
98
98
  email: public-rdf-ruby@w3.org
99
99
  executables: []
100
100
  extensions: []
@@ -108,14 +108,19 @@ files:
108
108
  - lib/rdf/normalize/base.rb
109
109
  - lib/rdf/normalize/carroll2001.rb
110
110
  - lib/rdf/normalize/format.rb
111
- - lib/rdf/normalize/urdna2015.rb
111
+ - lib/rdf/normalize/rdfc10.rb
112
112
  - lib/rdf/normalize/urgna2012.rb
113
113
  - lib/rdf/normalize/version.rb
114
114
  - lib/rdf/normalize/writer.rb
115
115
  homepage: https://github.com/ruby-rdf/rdf-normalize
116
116
  licenses:
117
117
  - Unlicense
118
- metadata: {}
118
+ metadata:
119
+ documentation_uri: https://ruby-rdf.github.io/rdf-normalize
120
+ bug_tracker_uri: https://github.com/ruby-rdf/rdf-normalize/issues
121
+ homepage_uri: https://github.com/ruby-rdf/rdf-normalize
122
+ mailing_list_uri: https://lists.w3.org/Archives/Public/public-rdf-ruby/
123
+ source_code_uri: https://github.com/ruby-rdf/rdf-normalize
119
124
  post_install_message:
120
125
  rdoc_options: []
121
126
  require_paths:
@@ -131,7 +136,7 @@ required_rubygems_version: !ruby/object:Gem::Requirement
131
136
  - !ruby/object:Gem::Version
132
137
  version: '0'
133
138
  requirements: []
134
- rubygems_version: 3.3.3
139
+ rubygems_version: 3.4.13
135
140
  signing_key:
136
141
  specification_version: 4
137
142
  summary: RDF Graph normalizer for Ruby.
@@ -1,263 +0,0 @@
1
- module RDF::Normalize
2
- class URDNA2015
3
- include RDF::Enumerable
4
- include RDF::Util::Logger
5
- include Base
6
-
7
- ##
8
- # Create an enumerable with grounded nodes
9
- #
10
- # @param [RDF::Enumerable] enumerable
11
- # @return [RDF::Enumerable]
12
- def initialize(enumerable, **options)
13
- @dataset, @options = enumerable, options
14
- end
15
-
16
- def each(&block)
17
- ns = NormalizationState.new(@options)
18
- normalize_statements(ns, &block)
19
- end
20
-
21
- protected
22
- def normalize_statements(ns, &block)
23
- # Map BNodes to the statements they are used by
24
- dataset.each_statement do |statement|
25
- statement.to_quad.compact.select(&:node?).each do |node|
26
- ns.add_statement(node, statement)
27
- end
28
- end
29
-
30
- non_normalized_identifiers, simple = ns.bnode_to_statements.keys, true
31
-
32
- while simple
33
- simple = false
34
- ns.hash_to_bnodes = {}
35
-
36
- # Calculate hashes for first degree nodes
37
- non_normalized_identifiers.each do |node|
38
- hash = log_depth {ns.hash_first_degree_quads(node)}
39
- log_debug("1deg") {"hash: #{hash}"}
40
- ns.add_bnode_hash(node, hash)
41
- end
42
-
43
- # Create canonical replacements for hashes mapping to a single node
44
- ns.hash_to_bnodes.keys.sort.each do |hash|
45
- identifier_list = ns.hash_to_bnodes[hash]
46
- next if identifier_list.length > 1
47
- node = identifier_list.first
48
- id = ns.canonical_issuer.issue_identifier(node)
49
- log_debug("single node") {"node: #{node.to_ntriples}, hash: #{hash}, id: #{id}"}
50
- non_normalized_identifiers -= identifier_list
51
- ns.hash_to_bnodes.delete(hash)
52
- simple = true
53
- end
54
- end
55
-
56
- # Iterate over hashs having more than one node
57
- ns.hash_to_bnodes.keys.sort.each do |hash|
58
- identifier_list = ns.hash_to_bnodes[hash]
59
-
60
- log_debug("multiple nodes") {"node: #{identifier_list.map(&:to_ntriples).join(",")}, hash: #{hash}"}
61
- hash_path_list = []
62
-
63
- # Create a hash_path_list for all bnodes using a temporary identifier used to create canonical replacements
64
- identifier_list.each do |identifier|
65
- next if ns.canonical_issuer.issued.include?(identifier)
66
- temporary_issuer = IdentifierIssuer.new("_:b")
67
- temporary_issuer.issue_identifier(identifier)
68
- hash_path_list << log_depth {ns.hash_n_degree_quads(identifier, temporary_issuer)}
69
- end
70
- log_debug("->") {"hash_path_list: #{hash_path_list.map(&:first).inspect}"}
71
-
72
- # Create canonical replacements for nodes
73
- hash_path_list.sort_by(&:first).map(&:last).each do |issuer|
74
- issuer.issued.each do |node|
75
- id = ns.canonical_issuer.issue_identifier(node)
76
- log_debug("-->") {"node: #{node.to_ntriples}, id: #{id}"}
77
- end
78
- end
79
- end
80
-
81
- # Yield statements using BNodes from canonical replacements
82
- dataset.each_statement do |statement|
83
- if statement.has_blank_nodes?
84
- quad = statement.to_quad.compact.map do |term|
85
- term.node? ? RDF::Node.intern(ns.canonical_issuer.identifier(term)[2..-1]) : term
86
- end
87
- block.call RDF::Statement.from(quad)
88
- else
89
- block.call statement
90
- end
91
- end
92
- end
93
-
94
- private
95
-
96
- class NormalizationState
97
- include RDF::Util::Logger
98
-
99
- attr_accessor :bnode_to_statements
100
- attr_accessor :hash_to_bnodes
101
- attr_accessor :canonical_issuer
102
-
103
- def initialize(options)
104
- @options = options
105
- @bnode_to_statements, @hash_to_bnodes, @canonical_issuer = {}, {}, IdentifierIssuer.new("_:c14n")
106
- end
107
-
108
- def add_statement(node, statement)
109
- bnode_to_statements[node] ||= []
110
- bnode_to_statements[node] << statement unless bnode_to_statements[node].include?(statement)
111
- end
112
-
113
- def add_bnode_hash(node, hash)
114
- hash_to_bnodes[hash] ||= []
115
- hash_to_bnodes[hash] << node unless hash_to_bnodes[hash].include?(node)
116
- end
117
-
118
- # @param [RDF::Node] node
119
- # @return [String] the SHA256 hexdigest hash of statements using this node, with replacements
120
- def hash_first_degree_quads(node)
121
- quads = bnode_to_statements[node].
122
- map do |statement|
123
- quad = statement.to_quad.map do |t|
124
- case t
125
- when node then RDF::Node("a")
126
- when RDF::Node then RDF::Node("z")
127
- else t
128
- end
129
- end
130
- RDF::NQuads::Writer.serialize(RDF::Statement.from(quad))
131
- end
132
-
133
- log_debug("1deg") {"node: #{node}, quads: #{quads}"}
134
- hexdigest(quads.sort.join)
135
- end
136
-
137
- # @param [RDF::Node] related
138
- # @param [RDF::Statement] statement
139
- # @param [IdentifierIssuer] issuer
140
- # @param [String] position one of :s, :o, or :g
141
- # @return [String] the SHA256 hexdigest hash
142
- def hash_related_node(related, statement, issuer, position)
143
- identifier = canonical_issuer.identifier(related) ||
144
- issuer.identifier(related) ||
145
- hash_first_degree_quads(related)
146
- input = position.to_s
147
- input << statement.predicate.to_ntriples unless position == :g
148
- input << identifier
149
- log_debug("hrel") {"input: #{input.inspect}, hash: #{hexdigest(input)}"}
150
- hexdigest(input)
151
- end
152
-
153
- # @param [RDF::Node] identifier
154
- # @param [IdentifierIssuer] issuer
155
- # @return [Array<String,IdentifierIssuer>] the Hash and issuer
156
- def hash_n_degree_quads(identifier, issuer)
157
- log_debug("ndeg") {"identifier: #{identifier.to_ntriples}"}
158
-
159
- # hash to related blank nodes map
160
- map = {}
161
-
162
- bnode_to_statements[identifier].each do |statement|
163
- hash_related_statement(identifier, statement, issuer, map)
164
- end
165
-
166
- data_to_hash = ""
167
-
168
- log_debug("ndeg") {"map: #{map.map {|h,l| "#{h}: #{l.map(&:to_ntriples)}"}.join('; ')}"}
169
- log_depth do
170
- map.keys.sort.each do |hash|
171
- list = map[hash]
172
- # Iterate over related nodes
173
- chosen_path, chosen_issuer = "", nil
174
- data_to_hash += hash
175
-
176
- list.permutation do |permutation|
177
- log_debug("ndeg") {"perm: #{permutation.map(&:to_ntriples).join(",")}"}
178
- issuer_copy, path, recursion_list = issuer.dup, "", []
179
-
180
- permutation.each do |related|
181
- if canonical_issuer.identifier(related)
182
- path << canonical_issuer.issue_identifier(related)
183
- else
184
- recursion_list << related if !issuer_copy.identifier(related)
185
- path << issuer_copy.issue_identifier(related)
186
- end
187
-
188
- # Skip to the next permutation if chosen path isn't empty and the path is greater than the chosen path
189
- break if !chosen_path.empty? && path.length >= chosen_path.length
190
- end
191
- log_debug("ndeg") {"hash: #{hash}, path: #{path}, recursion: #{recursion_list.map(&:to_ntriples)}"}
192
-
193
- recursion_list.each do |related|
194
- result = log_depth {hash_n_degree_quads(related, issuer_copy)}
195
- path << issuer_copy.issue_identifier(related)
196
- path << "<#{result.first}>"
197
- issuer_copy = result.last
198
- break if !chosen_path.empty? && path.length >= chosen_path.length && path > chosen_path
199
- end
200
-
201
- if chosen_path.empty? || path < chosen_path
202
- chosen_path, chosen_issuer = path, issuer_copy
203
- end
204
- end
205
-
206
- data_to_hash += chosen_path
207
- issuer = chosen_issuer
208
- end
209
- end
210
-
211
- log_debug("ndeg") {"datatohash: #{data_to_hash.inspect}, hash: #{hexdigest(data_to_hash)}"}
212
- return [hexdigest(data_to_hash), issuer]
213
- end
214
-
215
- protected
216
-
217
- def hexdigest(val)
218
- Digest::SHA256.hexdigest(val)
219
- end
220
-
221
- # Group adjacent bnodes by hash
222
- def hash_related_statement(identifier, statement, issuer, map)
223
- statement.to_h(:s, :p, :o, :g).each do |pos, term|
224
- next if !term.is_a?(RDF::Node) || term == identifier
225
-
226
- hash = log_depth {hash_related_node(term, statement, issuer, pos)}
227
- map[hash] ||= []
228
- map[hash] << term unless map[hash].include?(term)
229
- end
230
- end
231
- end
232
-
233
- class IdentifierIssuer
234
- def initialize(prefix = "_:c14n")
235
- @prefix, @counter, @issued = prefix, 0, {}
236
- end
237
-
238
- # Return an identifier for this BNode
239
- def issue_identifier(node)
240
- @issued[node] ||= begin
241
- res, @counter = @prefix + @counter.to_s, @counter + 1
242
- res
243
- end
244
- end
245
-
246
- def issued
247
- @issued.keys
248
- end
249
-
250
- def identifier(node)
251
- @issued[node]
252
- end
253
-
254
- # Duplicate this issuer, ensuring that the issued identifiers remain distinct
255
- # @return [IdentifierIssuer]
256
- def dup
257
- other = super
258
- other.instance_variable_set(:@issued, @issued.dup)
259
- other
260
- end
261
- end
262
- end
263
- end