rdf-normalize 0.5.0 → 0.6.0

Sign up to get free protection for your applications and to get access to all the features.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: 60932e4daabb349e76fb838c46aeea30ea39ff2406d7fd2f08b4df20b22975e4
4
- data.tar.gz: d1f1b87ac1a46ad18c7a1e848c8e7306b20cedbdb374cc1a197f8a2235875b27
3
+ metadata.gz: 15478756de443574bde6120436faf09bec1f7e40dcfc60f39fc97af92e686738
4
+ data.tar.gz: d5617da52a4d7e3429452f691e4a9ccb7f6ac8bedcef6dd66583b1322e0b57f0
5
5
  SHA512:
6
- metadata.gz: 263602b7c6861bf74745600e5828f5095cb9a3a3f08974621d050982a78bc6d4d6e1c6e421cc69e5bc9fa0146cfd0521b0f4817989b8644b4d25fa02c13015dd
7
- data.tar.gz: 98a1391dd232a2db5ea8353a51268fcfe9def16dc7a8f309bb305c7feeb9df85f8e96c8cc5dbcf4b3486e34d0133b33bdb353d2aa567bb948d412bb36be5e202
6
+ metadata.gz: 7c2ccd4449f12d5095702d19a8c1d27539aa5afa23c8b96ffcf6f43ee0d6d10fd763e2dbc98f2ef008ede3edc3fda1801eb6a1cd3ad0e80e3b82995017ae93e4
7
+ data.tar.gz: f760c7336703292679c82b6abbea86ffe7b8ac1b803508c187d8aee7bcd8cd635d0b039d928b7d145198f7df884027aeb911fa2e97e0e9d171cae92e4d26ed0b
data/README.md CHANGED
@@ -1,13 +1,13 @@
1
1
  # RDF::Normalize
2
2
  RDF Graph normalizer for [RDF.rb][RDF.rb].
3
3
 
4
- [![Gem Version](https://badge.fury.io/rb/rdf-normalize.png)](https://badge.fury.io/rb/rdf-normalize)
4
+ [![Gem Version](https://badge.fury.io/rb/rdf-normalize.svg)](https://badge.fury.io/rb/rdf-normalize)
5
5
  [![Build Status](https://github.com/ruby-rdf/rdf-normalize/workflows/CI/badge.svg?branch=develop)](https://github.com/ruby-rdf/rdf-normalize/actions?query=workflow%3ACI)
6
6
  [![Coverage Status](https://coveralls.io/repos/ruby-rdf/rdf-normalize/badge.svg?branch=develop)](https://coveralls.io/github/ruby-rdf/rdf-normalize?branch=develop)
7
7
  [![Gitter chat](https://badges.gitter.im/ruby-rdf/rdf.png)](https://gitter.im/ruby-rdf/rdf)
8
8
 
9
9
  ## Description
10
- This is a [Ruby][] implementation of a [RDF Normalize][] for [RDF.rb][].
10
+ This is a [Ruby][] implementation of a [RDF Dataset Canonicalization][] for [RDF.rb][].
11
11
 
12
12
  ## Features
13
13
  RDF::Normalize generates normalized [N-Quads][] output for an RDF Dataset using the algorithm
@@ -16,8 +16,8 @@ to serialize normalized statements.
16
16
 
17
17
  Algorithms implemented:
18
18
 
19
- * [URGNA2012](https://json-ld.github.io/normalization/spec/index.html#dfn-urgna2012)
20
- * [URDNA2015](https://json-ld.github.io/normalization/spec/index.html#dfn-urdna2015)
19
+ * [URGNA2012](https://www.w3.org/TR/rdf-canon/#dfn-urgna2012)
20
+ * [RDFC-1.0](https://www.w3.org/TR/rdf-canon/#dfn-rdfc-1-0)
21
21
 
22
22
  Install with `gem install rdf-normalize`
23
23
 
@@ -27,7 +27,17 @@ Install with `gem install rdf-normalize`
27
27
  ## Usage
28
28
 
29
29
  ## Documentation
30
- Full documentation available on [Rubydoc.info][Normalize doc]
30
+
31
+ Full documentation available on [GitHub][Normalize doc]
32
+
33
+ ## Examples
34
+
35
+ ### Returning normalized N-Quads
36
+
37
+ require 'rdf/normalize'
38
+ require 'rdf/turtle'
39
+ g = RDF::Graph.load("etc/doap.ttl")
40
+ puts g.dump(:normalize)
31
41
 
32
42
  ### Principle Classes
33
43
  * {RDF::Normalize}
@@ -35,8 +45,7 @@ Full documentation available on [Rubydoc.info][Normalize doc]
35
45
  * {RDF::Normalize::Format}
36
46
  * {RDF::Normalize::Writer}
37
47
  * {RDF::Normalize::URGNA2012}
38
- * {RDF::Normalize::URDNA2015}
39
-
48
+ * {RDF::Normalize::RDFC10}
40
49
 
41
50
  ## Dependencies
42
51
 
@@ -80,7 +89,7 @@ see <https://unlicense.org/> or the accompanying {file:LICENSE} file.
80
89
  [YARD]: https://yardoc.org/
81
90
  [YARD-GS]: https://rubydoc.info/docs/yard/file/docs/GettingStarted.md
82
91
  [PDD]: https://unlicense.org/#unlicensing-contributions
83
- [RDF.rb]: https://rubydoc.info/github/ruby-rdf/rdf-normalize
92
+ [RDF.rb]: https://ruby-rdf.github.io/rdf-normalize
84
93
  [N-Triples]: https://www.w3.org/TR/rdf-testcases/#ntriples
85
- [RDF Normalize]:https://json-ld.github.io/normalization/spec/
86
- [Normalize doc]:https://rubydoc.info/github/ruby-rdf/rdf-normalize/master
94
+ [RDF Dataset Canonicalization]: https://www.w3.org/TR/rdf-canon/
95
+ [Normalize doc]: https://ruby-rdf.github.io/rdf-normalize/
data/VERSION CHANGED
@@ -1 +1 @@
1
- 0.5.0
1
+ 0.6.0
@@ -0,0 +1,390 @@
1
+ require 'rdf/nquads'
2
+ begin
3
+ require 'json'
4
+ rescue LoadError
5
+ # Used for debug output
6
+ end
7
+
8
+ module RDF::Normalize
9
+ class RDFC10
10
+ include RDF::Enumerable
11
+ include RDF::Util::Logger
12
+ include Base
13
+
14
+ ##
15
+ # Create an enumerable with grounded nodes
16
+ #
17
+ # @param [RDF::Enumerable] enumerable
18
+ # @return [RDF::Enumerable]
19
+ def initialize(enumerable, **options)
20
+ @dataset, @options = enumerable, options
21
+ end
22
+
23
+ def each(&block)
24
+ ns = NormalizationState.new(@options)
25
+ log_debug("ca:")
26
+ log_debug(" log point", "Entering the canonicalization function (4.5.3).")
27
+ log_depth(depth: 2) {normalize_statements(ns, &block)}
28
+ end
29
+
30
+ protected
31
+ def normalize_statements(ns, &block)
32
+ # Step 2: Map BNodes to the statements they are used by
33
+ dataset.each_statement do |statement|
34
+ statement.to_quad.compact.select(&:node?).each do |node|
35
+ ns.add_statement(node, statement)
36
+ end
37
+ end
38
+ log_debug("ca.2:")
39
+ log_debug(" log point", "Extract quads for each bnode (4.5.3 (2)).")
40
+ log_debug(" Bnode to quads:")
41
+ if logger && logger.level == 0
42
+ ns.bnode_to_statements.each do |bn, statements|
43
+ log_debug(" #{bn.id}:")
44
+ statements.each do |s|
45
+ log_debug {" - #{s.to_nquads.strip}"}
46
+ end
47
+ end
48
+ end
49
+
50
+ ns.hash_to_bnodes = {}
51
+
52
+ # Step 3: Calculate hashes for first degree nodes
53
+ log_debug("ca.3:")
54
+ log_debug(" log point", "Calculated first degree hashes (4.5.3 (3)).")
55
+ log_debug(" with:")
56
+ ns.bnode_to_statements.each_key do |node|
57
+ log_debug(" - identifier") {node.id}
58
+ log_debug(" h1dq:")
59
+ hash = log_depth(depth: 8) {ns.hash_first_degree_quads(node)}
60
+ ns.add_bnode_hash(node, hash)
61
+ end
62
+
63
+ # Step 4: Create canonical replacements for hashes mapping to a single node
64
+ log_debug("ca.4:")
65
+ log_debug(" log point", "Create canonical replacements for hashes mapping to a single node (4.5.3 (4)).")
66
+ log_debug(" with:") unless ns.hash_to_bnodes.empty?
67
+ ns.hash_to_bnodes.keys.sort.each do |hash|
68
+ identifier_list = ns.hash_to_bnodes[hash]
69
+ next if identifier_list.length > 1
70
+ node = identifier_list.first
71
+ id = ns.canonical_issuer.issue_identifier(node)
72
+ log_debug(" - identifier") {node.id}
73
+ log_debug(" hash", hash)
74
+ log_debug(" canonical label", id)
75
+ ns.hash_to_bnodes.delete(hash)
76
+ end
77
+
78
+ # Step 5: Iterate over hashs having more than one node
79
+ log_debug("ca.5:") unless ns.hash_to_bnodes.empty?
80
+ log_debug(" log point", "Calculate hashes for identifiers with shared hashes (4.5.3 (5)).")
81
+ log_debug(" with:") unless ns.hash_to_bnodes.empty?
82
+ ns.hash_to_bnodes.keys.sort.each do |hash|
83
+ identifier_list = ns.hash_to_bnodes[hash]
84
+
85
+ log_debug(" - hash", hash)
86
+ log_debug(" identifier list") {identifier_list.map(&:id).to_json(indent: ' ')}
87
+ hash_path_list = []
88
+
89
+ # Create a hash_path_list for all bnodes using a temporary identifier used to create canonical replacements
90
+ log_debug(" ca.5.2:")
91
+ log_debug(" log point", "Calculate hashes for identifiers with shared hashes (4.5.3 (5.2)).")
92
+ log_debug(" with:") unless identifier_list.empty?
93
+ identifier_list.each do |identifier|
94
+ next if ns.canonical_issuer.issued.include?(identifier)
95
+ temporary_issuer = IdentifierIssuer.new("b")
96
+ temporary_issuer.issue_identifier(identifier)
97
+ log_debug(" - identifier") {identifier.id}
98
+ hash_path_list << log_depth(depth: 12) {ns.hash_n_degree_quads(identifier, temporary_issuer)}
99
+ end
100
+
101
+ # Create canonical replacements for nodes
102
+ log_debug(" ca.5.3:") unless hash_path_list.empty?
103
+ log_debug(" log point", "Canonical identifiers for temporary identifiers (4.5.3 (5.3)).")
104
+ log_debug(" issuer:") unless hash_path_list.empty?
105
+ hash_path_list.sort_by(&:first).each do |result, issuer|
106
+ issuer.issued.each do |node|
107
+ id = ns.canonical_issuer.issue_identifier(node)
108
+ log_debug(" - blank node") {node.id}
109
+ log_debug(" canonical identifier", id)
110
+ end
111
+ end
112
+ end
113
+
114
+ # Step 6: Yield statements using BNodes from canonical replacements
115
+ dataset.each_statement do |statement|
116
+ if statement.has_blank_nodes?
117
+ quad = statement.to_quad.compact.map do |term|
118
+ term.node? ? RDF::Node.intern(ns.canonical_issuer.identifier(term)) : term
119
+ end
120
+ block.call RDF::Statement.from(quad)
121
+ else
122
+ block.call statement
123
+ end
124
+ end
125
+
126
+ log_debug("ca.6:")
127
+ log_debug(" log point", "Replace original with canonical labels (4.5.3 (6)).")
128
+ log_debug(" canonical issuer: #{ns.canonical_issuer.inspect}")
129
+ dataset
130
+ end
131
+
132
+ private
133
+
134
+ class NormalizationState
135
+ include RDF::Util::Logger
136
+
137
+ attr_accessor :bnode_to_statements
138
+ attr_accessor :hash_to_bnodes
139
+ attr_accessor :canonical_issuer
140
+
141
+ def initialize(options)
142
+ @options = options
143
+ @bnode_to_statements, @hash_to_bnodes, @canonical_issuer = {}, {}, IdentifierIssuer.new("c14n")
144
+ end
145
+
146
+ def add_statement(node, statement)
147
+ bnode_to_statements[node] ||= []
148
+ bnode_to_statements[node] << statement unless bnode_to_statements[node].any? {|st| st.eql?(statement)}
149
+ end
150
+
151
+ def add_bnode_hash(node, hash)
152
+ hash_to_bnodes[hash] ||= []
153
+ # Match on object IDs of nodes, rather than simple node equality
154
+ hash_to_bnodes[hash] << node unless hash_to_bnodes[hash].any? {|n| n.eql?(node)}
155
+ end
156
+
157
+ # This algorithm calculates a hash for a given blank node across the quads in a dataset in which that blank node is a component. If the hash uniquely identifies that blank node, no further examination is necessary. Otherwise, a hash will be created for the blank node using the algorithm in [4.9 Hash N-Degree Quads](https://w3c.github.io/rdf-canon/spec/#hash-nd-quads) invoked via [4.5 Canonicalization Algorithm](https://w3c.github.io/rdf-canon/spec/#canon-algorithm).
158
+ #
159
+ # @param [RDF::Node] node The reference blank node identifier
160
+ # @return [String] the SHA256 hexdigest hash of statements using this node, with replacements
161
+ def hash_first_degree_quads(node)
162
+ nquads = bnode_to_statements[node].
163
+ map do |statement|
164
+ quad = statement.to_quad.map do |t|
165
+ case t
166
+ when node then RDF::Node("a")
167
+ when RDF::Node then RDF::Node("z")
168
+ else t
169
+ end
170
+ end
171
+ RDF::Statement.from(quad).to_nquads
172
+ end
173
+ log_debug("log point", "Hash First Degree Quads function (4.7.3).")
174
+ log_debug("nquads:")
175
+ nquads.each do |q|
176
+ log_debug {" - #{q.strip}"}
177
+ end
178
+
179
+ result = hexdigest(nquads.sort.join)
180
+ log_debug("hash") {result}
181
+ result
182
+ end
183
+
184
+ # @param [RDF::Node] related
185
+ # @param [RDF::Statement] statement
186
+ # @param [IdentifierIssuer] issuer
187
+ # @param [String] position one of :s, :o, or :g
188
+ # @return [String] the SHA256 hexdigest hash
189
+ def hash_related_node(related, statement, issuer, position)
190
+ log_debug("related") {related.id}
191
+ input = "#{position}"
192
+ input << statement.predicate.to_ntriples unless position == :g
193
+ if identifier = (canonical_issuer.identifier(related) ||
194
+ issuer.identifier(related))
195
+ input << "_:#{identifier}"
196
+ else
197
+ log_debug("h1dq:")
198
+ input << log_depth(depth: 2) do
199
+ hash_first_degree_quads(related)
200
+ end
201
+ end
202
+ log_debug("input") {input.inspect}
203
+ log_debug("hash") {hexdigest(input)}
204
+ hexdigest(input)
205
+ end
206
+
207
+ # @param [RDF::Node] identifier
208
+ # @param [IdentifierIssuer] issuer
209
+ # @return [Array<String,IdentifierIssuer>] the Hash and issuer
210
+ def hash_n_degree_quads(identifier, issuer)
211
+ log_debug("hndq:")
212
+ log_debug(" log point", "Hash N-Degree Quads function (4.9.3).")
213
+ log_debug(" identifier") {identifier.id}
214
+ log_debug(" issuer") {issuer.inspect}
215
+
216
+ # hash to related blank nodes map
217
+ hn = {}
218
+
219
+ log_debug(" hndq.2:")
220
+ log_debug(" log point", "Quads for identifier (4.9.3 (2)).")
221
+ log_debug(" quads:")
222
+ bnode_to_statements[identifier].each do |s|
223
+ log_debug {" - #{s.to_nquads.strip}"}
224
+ end
225
+
226
+ # Step 3
227
+ log_debug(" hndq.3:")
228
+ log_debug(" log point", "Hash N-Degree Quads function (4.9.3 (3)).")
229
+ log_debug(" with:") unless bnode_to_statements[identifier].empty?
230
+ bnode_to_statements[identifier].each do |statement|
231
+ log_debug {" - quad: #{statement.to_nquads.strip}"}
232
+ log_debug(" hndq.3.1:")
233
+ log_debug(" log point", "Hash related bnode component (4.9.3 (3.1))")
234
+ log_depth(depth: 10) {hash_related_statement(identifier, statement, issuer, hn)}
235
+ end
236
+ log_debug(" Hash to bnodes:")
237
+ hn.each do |k,v|
238
+ log_debug(" #{k}:")
239
+ v.each do |vv|
240
+ log_debug(" - #{vv.id}")
241
+ end
242
+ end
243
+
244
+ data_to_hash = ""
245
+
246
+ # Step 5
247
+ log_debug(" hndq.5:")
248
+ log_debug(" log point", "Hash N-Degree Quads function (4.9.3 (5)), entering loop.")
249
+ log_debug(" with:")
250
+ hn.keys.sort.each do |hash|
251
+ log_debug(" - related hash", hash)
252
+ log_debug(" data to hash") {data_to_hash.to_json}
253
+ list = hn[hash]
254
+ # Iterate over related nodes
255
+ chosen_path, chosen_issuer = "", nil
256
+ data_to_hash += hash
257
+
258
+ log_debug(" hndq.5.4:")
259
+ log_debug(" log point", "Hash N-Degree Quads function (4.9.3 (5.4)), entering loop.")
260
+ log_debug(" with:") unless list.empty?
261
+ list.permutation do |permutation|
262
+ log_debug(" - perm") {permutation.map(&:id).to_json(indent: ' ', space: ' ')}
263
+ issuer_copy, path, recursion_list = issuer.dup, "", []
264
+
265
+ log_debug(" hndq.5.4.4:")
266
+ log_debug(" log point", "Hash N-Degree Quads function (4.9.3 (5.4.4)), entering loop.")
267
+ log_debug(" with:")
268
+ permutation.each do |related|
269
+ log_debug(" - related") {related.id}
270
+ log_debug(" path") {path.to_json}
271
+ if canonical_issuer.identifier(related)
272
+ path << '_:' + canonical_issuer.issue_identifier(related)
273
+ else
274
+ recursion_list << related if !issuer_copy.identifier(related)
275
+ path << '_:' + issuer_copy.issue_identifier(related)
276
+ end
277
+
278
+ # Skip to the next permutation if chosen path isn't empty and the path is greater than the chosen path
279
+ break if !chosen_path.empty? && path.length >= chosen_path.length
280
+ end
281
+
282
+ log_debug(" hndq.5.4.5:")
283
+ log_debug(" log point", "Hash N-Degree Quads function (4.9.3 (5.4.5)), before possible recursion.")
284
+ log_debug(" recursion list") {recursion_list.map(&:id).to_json(indent: ' ')}
285
+ log_debug(" path") {path.to_json}
286
+ log_debug(" with:") unless recursion_list.empty?
287
+ recursion_list.each do |related|
288
+ log_debug(" - related") {related.id}
289
+ result = log_depth(depth: 18) {hash_n_degree_quads(related, issuer_copy)}
290
+ path << '_:' + issuer_copy.issue_identifier(related)
291
+ path << "<#{result.first}>"
292
+ issuer_copy = result.last
293
+ log_debug(" hndq.5.4.5.4:")
294
+ log_debug(" log point", "Hash N-Degree Quads function (4.9.3 (5.4.5.4)), combine result of recursion.")
295
+ log_debug(" path") {path.to_json}
296
+ log_debug(" issuer copy") {issuer_copy.inspect}
297
+ break if !chosen_path.empty? && path.length >= chosen_path.length && path > chosen_path
298
+ end
299
+
300
+ if chosen_path.empty? || path < chosen_path
301
+ chosen_path, chosen_issuer = path, issuer_copy
302
+ end
303
+ end
304
+
305
+ data_to_hash += chosen_path
306
+ log_debug(" hndq.5.5:")
307
+ log_debug(" log point", "Hash N-Degree Quads function (4.9.3 (5.5). End of current loop with Hn hashes.")
308
+ log_debug(" chosen path") {chosen_path.to_json}
309
+ log_debug(" data to hash") {data_to_hash.to_json}
310
+ issuer = chosen_issuer
311
+ end
312
+
313
+ log_debug(" hndq.6:")
314
+ log_debug(" log point", "Leaving Hash N-Degree Quads function (4.9.3).")
315
+ log_debug(" hash") {hexdigest(data_to_hash)}
316
+ log_depth(depth: 4) {log_debug("issuer") {issuer.inspect}}
317
+ return [hexdigest(data_to_hash), issuer]
318
+ end
319
+
320
+ def inspect
321
+ "NormalizationState:\nbnode_to_statements: #{inspect_bnode_to_statements}\nhash_to_bnodes: #{inspect_hash_to_bnodes}\ncanonical_issuer: #{canonical_issuer.inspect}"
322
+ end
323
+
324
+ def inspect_bnode_to_statements
325
+ bnode_to_statements.map do |n, statements|
326
+ "#{n.id}: #{statements.map {|s| s.to_nquads.strip}}"
327
+ end.join(", ")
328
+ end
329
+
330
+ def inspect_hash_to_bnodes
331
+ end
332
+
333
+ protected
334
+
335
+ def hexdigest(val)
336
+ Digest::SHA256.hexdigest(val)
337
+ end
338
+
339
+ # Group adjacent bnodes by hash
340
+ def hash_related_statement(identifier, statement, issuer, map)
341
+ log_debug("with:") if statement.to_h.values.any? {|t| t.is_a?(RDF::Node)}
342
+ statement.to_h(:s, :p, :o, :g).each do |pos, term|
343
+ next if !term.is_a?(RDF::Node) || term == identifier
344
+
345
+ log_debug(" - position", pos)
346
+ hash = log_depth(depth: 4) {hash_related_node(term, statement, issuer, pos)}
347
+ map[hash] ||= []
348
+ map[hash] << term unless map[hash].any? {|n| n.eql?(term)}
349
+ end
350
+ end
351
+ end
352
+
353
+ class IdentifierIssuer
354
+ def initialize(prefix = "c14n")
355
+ @prefix, @counter, @issued = prefix, 0, {}
356
+ end
357
+
358
+ # Return an identifier for this BNode
359
+ # @param [RDF::Node] node
360
+ # @return [String] Canonical identifier for node
361
+ def issue_identifier(node)
362
+ @issued[node] ||= begin
363
+ res, @counter = @prefix + @counter.to_s, @counter + 1
364
+ res
365
+ end
366
+ end
367
+
368
+ def issued
369
+ @issued.keys
370
+ end
371
+
372
+ # @return [RDF::Node] Canonical identifier assigned to node
373
+ def identifier(node)
374
+ @issued[node]
375
+ end
376
+
377
+ # Duplicate this issuer, ensuring that the issued identifiers remain distinct
378
+ # @return [IdentifierIssuer]
379
+ def dup
380
+ other = super
381
+ other.instance_variable_set(:@issued, @issued.dup)
382
+ other
383
+ end
384
+
385
+ def inspect
386
+ "{#{@issued.map {|k,v| "#{k.id}: #{v}"}.join(', ')}}"
387
+ end
388
+ end
389
+ end
390
+ end
@@ -1,12 +1,12 @@
1
1
  module RDF::Normalize
2
- class URGNA2012 < URDNA2015
2
+ class URGNA2012 < RDFC10
3
3
 
4
4
  def each(&block)
5
5
  ns = NormalizationState.new(@options)
6
6
  normalize_statements(ns, &block)
7
7
  end
8
8
 
9
- class NormalizationState < URDNA2015::NormalizationState
9
+ class NormalizationState < RDFC10::NormalizationState
10
10
  protected
11
11
 
12
12
  # 2012 version uses SHA-1
@@ -23,9 +23,7 @@ module RDF::Normalize
23
23
  identifier = canonical_issuer.identifier(related) ||
24
24
  issuer.identifier(related) ||
25
25
  hash_first_degree_quads(related)
26
- input = position.to_s
27
- input << statement.predicate.to_s
28
- input << identifier
26
+ input = "#{position}#{statement.predicate}#{identifier}"
29
27
  log_debug("hrel") {"input: #{input.inspect}, hash: #{hexdigest(input)}"}
30
28
  hexdigest(input)
31
29
  end
@@ -35,11 +33,11 @@ module RDF::Normalize
35
33
  if statement.subject.node? && statement.subject != identifier
36
34
  hash = log_depth {hash_related_node(statement.subject, statement, issuer, :p)}
37
35
  map[hash] ||= []
38
- map[hash] << statement.subject unless map[hash].include?(statement.subject)
36
+ map[hash] << statement.subject unless map[hash].any? {|n| n.eql?(statement.subject)}
39
37
  elsif statement.object.node? && statement.object != identifier
40
38
  hash = log_depth {hash_related_node(statement.object, statement, issuer, :r)}
41
39
  map[hash] ||= []
42
- map[hash] << statement.object unless map[hash].include?(statement.object)
40
+ map[hash] << statement.object unless map[hash].any? {|n| n.eql?(statement.object)}
43
41
  end
44
42
  end
45
43
  end
@@ -1,5 +1,5 @@
1
1
  module RDF::Normalize::VERSION
2
- VERSION_FILE = File.join(File.expand_path(File.dirname(__FILE__)), "..", "..", "..", "VERSION")
2
+ VERSION_FILE = File.expand_path("../../../../VERSION", __FILE__)
3
3
  MAJOR, MINOR, TINY, EXTRA = File.read(VERSION_FILE).chop.split(".")
4
4
 
5
5
  STRING = [MAJOR, MINOR, TINY, EXTRA].compact.join('.')
@@ -4,7 +4,7 @@ module RDF::Normalize
4
4
  #
5
5
  # Normalizes the enumerated statements into normal form in the form of N-Quads.
6
6
  #
7
- # @author [Gregg Kellogg](http://greggkellogg.net/)
7
+ # @author [Gregg Kellogg](https://greggkellogg.net/)
8
8
  class Writer < RDF::NQuads::Writer
9
9
  format RDF::Normalize::Format
10
10
 
@@ -53,7 +53,7 @@ module RDF::Normalize
53
53
  #
54
54
  # @return [void]
55
55
  def write_epilogue
56
- statements = RDF::Normalize.new(@repo, **@options).
56
+ RDF::Normalize.new(@repo, **@options).
57
57
  statements.
58
58
  reject(&:variable?).
59
59
  map {|s| format_statement(s)}.
data/lib/rdf/normalize.rb CHANGED
@@ -1,4 +1,5 @@
1
1
  require 'rdf'
2
+ require 'digest'
2
3
 
3
4
  module RDF
4
5
  ##
@@ -25,13 +26,13 @@ module RDF
25
26
  # writer << RDF::Repository.load("etc/doap.ttl")
26
27
  # end
27
28
  #
28
- # @author [Gregg Kellogg](http://greggkellogg.net/)
29
+ # @author [Gregg Kellogg](https://greggkellogg.net/)
29
30
  module Normalize
30
31
  require 'rdf/normalize/format'
31
32
  autoload :Base, 'rdf/normalize/base'
32
33
  autoload :Carroll2001,'rdf/normalize/carroll2001'
33
34
  autoload :URGNA2012, 'rdf/normalize/urgna2012'
34
- autoload :URDNA2015, 'rdf/normalize/urdna2015'
35
+ autoload :RDFC10, 'rdf/normalize/rdfc10'
35
36
  autoload :VERSION, 'rdf/normalize/version'
36
37
  autoload :Writer, 'rdf/normalize/writer'
37
38
 
@@ -42,19 +43,19 @@ module RDF
42
43
  ALGORITHMS = {
43
44
  carroll2001: :Carroll2001,
44
45
  urgna2012: :URGNA2012,
45
- urdna2015: :URDNA2015
46
+ rdfc10: :RDFC10
46
47
  }.freeze
47
48
 
48
49
  ##
49
50
  # Creates a new normalizer instance using either the specified or default normalizer algorithm
50
51
  # @param [RDF::Enumerable] enumerable
51
52
  # @param [Hash{Symbol => Object}] options
52
- # @option options [Base] :algorithm (:urdna2015)
53
- # One of `:carroll2001`, `:urgna2012`, or `:urdna2015`
53
+ # @option options [Base] :algorithm (:rdfc10)
54
+ # One of `:carroll2001`, `:urgna2012`, or `:rdfc10`
54
55
  # @return [RDF::Normalize::Base]
55
56
  # @raise [ArgumentError] selected algorithm not defined
56
57
  def new(enumerable, **options)
57
- algorithm = options.fetch(:algorithm, :urdna2015)
58
+ algorithm = options.fetch(:algorithm, :rdfc10)
58
59
  raise ArgumentError, "No algoritm defined for #{algorithm.to_sym}" unless ALGORITHMS.has_key?(algorithm)
59
60
  algorithm_class = const_get(ALGORITHMS[algorithm])
60
61
  algorithm_class.new(enumerable, **options)
metadata CHANGED
@@ -1,14 +1,14 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: rdf-normalize
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.5.0
4
+ version: 0.6.0
5
5
  platform: ruby
6
6
  authors:
7
7
  - Gregg Kellogg
8
8
  autorequire:
9
9
  bindir: bin
10
10
  cert_chain: []
11
- date: 2021-12-07 00:00:00.000000000 Z
11
+ date: 2023-06-10 00:00:00.000000000 Z
12
12
  dependencies:
13
13
  - !ruby/object:Gem::Dependency
14
14
  name: rdf
@@ -53,21 +53,21 @@ dependencies:
53
53
  - !ruby/object:Gem::Version
54
54
  version: '3.10'
55
55
  - !ruby/object:Gem::Dependency
56
- name: webmock
56
+ name: json-ld
57
57
  requirement: !ruby/object:Gem::Requirement
58
58
  requirements:
59
59
  - - "~>"
60
60
  - !ruby/object:Gem::Version
61
- version: '3.11'
61
+ version: '3.2'
62
62
  type: :development
63
63
  prerelease: false
64
64
  version_requirements: !ruby/object:Gem::Requirement
65
65
  requirements:
66
66
  - - "~>"
67
67
  - !ruby/object:Gem::Version
68
- version: '3.11'
68
+ version: '3.2'
69
69
  - !ruby/object:Gem::Dependency
70
- name: json-ld
70
+ name: rdf-trig
71
71
  requirement: !ruby/object:Gem::Requirement
72
72
  requirements:
73
73
  - - "~>"
@@ -94,7 +94,7 @@ dependencies:
94
94
  - - "~>"
95
95
  - !ruby/object:Gem::Version
96
96
  version: '0.9'
97
- description: RDF::Normalize is a Graph normalizer for the RDF.rb library suite.
97
+ description: RDF::Normalize performs Dataset Canonicalization for RDF.rb.
98
98
  email: public-rdf-ruby@w3.org
99
99
  executables: []
100
100
  extensions: []
@@ -108,14 +108,19 @@ files:
108
108
  - lib/rdf/normalize/base.rb
109
109
  - lib/rdf/normalize/carroll2001.rb
110
110
  - lib/rdf/normalize/format.rb
111
- - lib/rdf/normalize/urdna2015.rb
111
+ - lib/rdf/normalize/rdfc10.rb
112
112
  - lib/rdf/normalize/urgna2012.rb
113
113
  - lib/rdf/normalize/version.rb
114
114
  - lib/rdf/normalize/writer.rb
115
115
  homepage: https://github.com/ruby-rdf/rdf-normalize
116
116
  licenses:
117
117
  - Unlicense
118
- metadata: {}
118
+ metadata:
119
+ documentation_uri: https://ruby-rdf.github.io/rdf-normalize
120
+ bug_tracker_uri: https://github.com/ruby-rdf/rdf-normalize/issues
121
+ homepage_uri: https://github.com/ruby-rdf/rdf-normalize
122
+ mailing_list_uri: https://lists.w3.org/Archives/Public/public-rdf-ruby/
123
+ source_code_uri: https://github.com/ruby-rdf/rdf-normalize
119
124
  post_install_message:
120
125
  rdoc_options: []
121
126
  require_paths:
@@ -131,7 +136,7 @@ required_rubygems_version: !ruby/object:Gem::Requirement
131
136
  - !ruby/object:Gem::Version
132
137
  version: '0'
133
138
  requirements: []
134
- rubygems_version: 3.3.3
139
+ rubygems_version: 3.4.13
135
140
  signing_key:
136
141
  specification_version: 4
137
142
  summary: RDF Graph normalizer for Ruby.
@@ -1,263 +0,0 @@
1
- module RDF::Normalize
2
- class URDNA2015
3
- include RDF::Enumerable
4
- include RDF::Util::Logger
5
- include Base
6
-
7
- ##
8
- # Create an enumerable with grounded nodes
9
- #
10
- # @param [RDF::Enumerable] enumerable
11
- # @return [RDF::Enumerable]
12
- def initialize(enumerable, **options)
13
- @dataset, @options = enumerable, options
14
- end
15
-
16
- def each(&block)
17
- ns = NormalizationState.new(@options)
18
- normalize_statements(ns, &block)
19
- end
20
-
21
- protected
22
- def normalize_statements(ns, &block)
23
- # Map BNodes to the statements they are used by
24
- dataset.each_statement do |statement|
25
- statement.to_quad.compact.select(&:node?).each do |node|
26
- ns.add_statement(node, statement)
27
- end
28
- end
29
-
30
- non_normalized_identifiers, simple = ns.bnode_to_statements.keys, true
31
-
32
- while simple
33
- simple = false
34
- ns.hash_to_bnodes = {}
35
-
36
- # Calculate hashes for first degree nodes
37
- non_normalized_identifiers.each do |node|
38
- hash = log_depth {ns.hash_first_degree_quads(node)}
39
- log_debug("1deg") {"hash: #{hash}"}
40
- ns.add_bnode_hash(node, hash)
41
- end
42
-
43
- # Create canonical replacements for hashes mapping to a single node
44
- ns.hash_to_bnodes.keys.sort.each do |hash|
45
- identifier_list = ns.hash_to_bnodes[hash]
46
- next if identifier_list.length > 1
47
- node = identifier_list.first
48
- id = ns.canonical_issuer.issue_identifier(node)
49
- log_debug("single node") {"node: #{node.to_ntriples}, hash: #{hash}, id: #{id}"}
50
- non_normalized_identifiers -= identifier_list
51
- ns.hash_to_bnodes.delete(hash)
52
- simple = true
53
- end
54
- end
55
-
56
- # Iterate over hashs having more than one node
57
- ns.hash_to_bnodes.keys.sort.each do |hash|
58
- identifier_list = ns.hash_to_bnodes[hash]
59
-
60
- log_debug("multiple nodes") {"node: #{identifier_list.map(&:to_ntriples).join(",")}, hash: #{hash}"}
61
- hash_path_list = []
62
-
63
- # Create a hash_path_list for all bnodes using a temporary identifier used to create canonical replacements
64
- identifier_list.each do |identifier|
65
- next if ns.canonical_issuer.issued.include?(identifier)
66
- temporary_issuer = IdentifierIssuer.new("_:b")
67
- temporary_issuer.issue_identifier(identifier)
68
- hash_path_list << log_depth {ns.hash_n_degree_quads(identifier, temporary_issuer)}
69
- end
70
- log_debug("->") {"hash_path_list: #{hash_path_list.map(&:first).inspect}"}
71
-
72
- # Create canonical replacements for nodes
73
- hash_path_list.sort_by(&:first).map(&:last).each do |issuer|
74
- issuer.issued.each do |node|
75
- id = ns.canonical_issuer.issue_identifier(node)
76
- log_debug("-->") {"node: #{node.to_ntriples}, id: #{id}"}
77
- end
78
- end
79
- end
80
-
81
- # Yield statements using BNodes from canonical replacements
82
- dataset.each_statement do |statement|
83
- if statement.has_blank_nodes?
84
- quad = statement.to_quad.compact.map do |term|
85
- term.node? ? RDF::Node.intern(ns.canonical_issuer.identifier(term)[2..-1]) : term
86
- end
87
- block.call RDF::Statement.from(quad)
88
- else
89
- block.call statement
90
- end
91
- end
92
- end
93
-
94
- private
95
-
96
- class NormalizationState
97
- include RDF::Util::Logger
98
-
99
- attr_accessor :bnode_to_statements
100
- attr_accessor :hash_to_bnodes
101
- attr_accessor :canonical_issuer
102
-
103
- def initialize(options)
104
- @options = options
105
- @bnode_to_statements, @hash_to_bnodes, @canonical_issuer = {}, {}, IdentifierIssuer.new("_:c14n")
106
- end
107
-
108
- def add_statement(node, statement)
109
- bnode_to_statements[node] ||= []
110
- bnode_to_statements[node] << statement unless bnode_to_statements[node].include?(statement)
111
- end
112
-
113
- def add_bnode_hash(node, hash)
114
- hash_to_bnodes[hash] ||= []
115
- hash_to_bnodes[hash] << node unless hash_to_bnodes[hash].include?(node)
116
- end
117
-
118
- # @param [RDF::Node] node
119
- # @return [String] the SHA256 hexdigest hash of statements using this node, with replacements
120
- def hash_first_degree_quads(node)
121
- quads = bnode_to_statements[node].
122
- map do |statement|
123
- quad = statement.to_quad.map do |t|
124
- case t
125
- when node then RDF::Node("a")
126
- when RDF::Node then RDF::Node("z")
127
- else t
128
- end
129
- end
130
- RDF::NQuads::Writer.serialize(RDF::Statement.from(quad))
131
- end
132
-
133
- log_debug("1deg") {"node: #{node}, quads: #{quads}"}
134
- hexdigest(quads.sort.join)
135
- end
136
-
137
- # @param [RDF::Node] related
138
- # @param [RDF::Statement] statement
139
- # @param [IdentifierIssuer] issuer
140
- # @param [String] position one of :s, :o, or :g
141
- # @return [String] the SHA256 hexdigest hash
142
- def hash_related_node(related, statement, issuer, position)
143
- identifier = canonical_issuer.identifier(related) ||
144
- issuer.identifier(related) ||
145
- hash_first_degree_quads(related)
146
- input = position.to_s
147
- input << statement.predicate.to_ntriples unless position == :g
148
- input << identifier
149
- log_debug("hrel") {"input: #{input.inspect}, hash: #{hexdigest(input)}"}
150
- hexdigest(input)
151
- end
152
-
153
- # @param [RDF::Node] identifier
154
- # @param [IdentifierIssuer] issuer
155
- # @return [Array<String,IdentifierIssuer>] the Hash and issuer
156
- def hash_n_degree_quads(identifier, issuer)
157
- log_debug("ndeg") {"identifier: #{identifier.to_ntriples}"}
158
-
159
- # hash to related blank nodes map
160
- map = {}
161
-
162
- bnode_to_statements[identifier].each do |statement|
163
- hash_related_statement(identifier, statement, issuer, map)
164
- end
165
-
166
- data_to_hash = ""
167
-
168
- log_debug("ndeg") {"map: #{map.map {|h,l| "#{h}: #{l.map(&:to_ntriples)}"}.join('; ')}"}
169
- log_depth do
170
- map.keys.sort.each do |hash|
171
- list = map[hash]
172
- # Iterate over related nodes
173
- chosen_path, chosen_issuer = "", nil
174
- data_to_hash += hash
175
-
176
- list.permutation do |permutation|
177
- log_debug("ndeg") {"perm: #{permutation.map(&:to_ntriples).join(",")}"}
178
- issuer_copy, path, recursion_list = issuer.dup, "", []
179
-
180
- permutation.each do |related|
181
- if canonical_issuer.identifier(related)
182
- path << canonical_issuer.issue_identifier(related)
183
- else
184
- recursion_list << related if !issuer_copy.identifier(related)
185
- path << issuer_copy.issue_identifier(related)
186
- end
187
-
188
- # Skip to the next permutation if chosen path isn't empty and the path is greater than the chosen path
189
- break if !chosen_path.empty? && path.length >= chosen_path.length
190
- end
191
- log_debug("ndeg") {"hash: #{hash}, path: #{path}, recursion: #{recursion_list.map(&:to_ntriples)}"}
192
-
193
- recursion_list.each do |related|
194
- result = log_depth {hash_n_degree_quads(related, issuer_copy)}
195
- path << issuer_copy.issue_identifier(related)
196
- path << "<#{result.first}>"
197
- issuer_copy = result.last
198
- break if !chosen_path.empty? && path.length >= chosen_path.length && path > chosen_path
199
- end
200
-
201
- if chosen_path.empty? || path < chosen_path
202
- chosen_path, chosen_issuer = path, issuer_copy
203
- end
204
- end
205
-
206
- data_to_hash += chosen_path
207
- issuer = chosen_issuer
208
- end
209
- end
210
-
211
- log_debug("ndeg") {"datatohash: #{data_to_hash.inspect}, hash: #{hexdigest(data_to_hash)}"}
212
- return [hexdigest(data_to_hash), issuer]
213
- end
214
-
215
- protected
216
-
217
- def hexdigest(val)
218
- Digest::SHA256.hexdigest(val)
219
- end
220
-
221
- # Group adjacent bnodes by hash
222
- def hash_related_statement(identifier, statement, issuer, map)
223
- statement.to_h(:s, :p, :o, :g).each do |pos, term|
224
- next if !term.is_a?(RDF::Node) || term == identifier
225
-
226
- hash = log_depth {hash_related_node(term, statement, issuer, pos)}
227
- map[hash] ||= []
228
- map[hash] << term unless map[hash].include?(term)
229
- end
230
- end
231
- end
232
-
233
- class IdentifierIssuer
234
- def initialize(prefix = "_:c14n")
235
- @prefix, @counter, @issued = prefix, 0, {}
236
- end
237
-
238
- # Return an identifier for this BNode
239
- def issue_identifier(node)
240
- @issued[node] ||= begin
241
- res, @counter = @prefix + @counter.to_s, @counter + 1
242
- res
243
- end
244
- end
245
-
246
- def issued
247
- @issued.keys
248
- end
249
-
250
- def identifier(node)
251
- @issued[node]
252
- end
253
-
254
- # Duplicate this issuer, ensuring that the issued identifiers remain distinct
255
- # @return [IdentifierIssuer]
256
- def dup
257
- other = super
258
- other.instance_variable_set(:@issued, @issued.dup)
259
- other
260
- end
261
- end
262
- end
263
- end