rdf-normalize 0.1.0

Sign up to get free protection for your applications and to get access to all the features.
checksums.yaml ADDED
@@ -0,0 +1,7 @@
1
+ ---
2
+ SHA1:
3
+ metadata.gz: d7076dcfeccdbfc0b35ec046d0b338a6ad41d776
4
+ data.tar.gz: cd5f278797b575a3a6cced04890b9014c2350f42
5
+ SHA512:
6
+ metadata.gz: 2510cec72f19af6eef55678382f688a5948b59fad4c6be18465c53a66d16b25bb543dadbd5a5148675d291afac133c8cc7d5399650ad9043b858c1a1f6165291
7
+ data.tar.gz: f785bd00b4abacf7da181daae96e2101aa449b19791a08742654451cf9a3b25abdf368dd77d99a86c4f74a49084b2d9af6464ea30bbed45589f87713e899ee63
data/AUTHORS ADDED
@@ -0,0 +1 @@
1
+ * Gregg Kellogg <gregg@greggkellogg.net>
data/LICENSE ADDED
@@ -0,0 +1,25 @@
1
+ This is free and unencumbered software released into the public domain.
2
+
3
+ Anyone is free to copy, modify, publish, use, compile, sell, or
4
+ distribute this software, either in source code form or as a compiled
5
+ binary, for any purpose, commercial or non-commercial, and by any
6
+ means.
7
+
8
+ In jurisdictions that recognize copyright laws, the author or authors
9
+ of this software dedicate any and all copyright interest in the
10
+ software to the public domain. We make this dedication for the benefit
11
+ of the public at large and to the detriment of our heirs and
12
+ successors. We intend this dedication to be an overt act of
13
+ relinquishment in perpetuity of all present and future rights to this
14
+ software under copyright law.
15
+
16
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
17
+ EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
18
+ MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.
19
+ IN NO EVENT SHALL THE AUTHORS BE LIABLE FOR ANY CLAIM, DAMAGES OR
20
+ OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
21
+ ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
22
+ OTHER DEALINGS IN THE SOFTWARE.
23
+
24
+ For more information, please refer to <http://unlicense.org>
25
+
data/README.md ADDED
@@ -0,0 +1,82 @@
1
+ # RDF::Normalize
2
+ RDF Graph normalizer for [RDF.rb][RDF.rb].
3
+
4
+ [![Gem Version](https://badge.fury.io/rb/rdf-normalize.png)](http://badge.fury.io/rb/rdf-normalize)
5
+ [![Build Status](https://secure.travis-ci.org/ruby-rdf/rdf-normalize.png?branch=master)](http://travis-ci.org/ruby-rdf/rdf-normalize)
6
+
7
+ ## Description
8
+ This is a [Ruby][] implementation of a [RDF Normalize][] for [RDF.rb][].
9
+
10
+ ## Features
11
+ RDF::Normalize generates normalized [N-Quads][] output for an RDF Dataset using the algorithm
12
+ defined in [RDF Normalize][]. It also implements an RDF Writer interface, which can be used
13
+ to serialize normalized statements.
14
+
15
+ Algorithms implemented:
16
+
17
+ * [URGNA2012](http://json-ld.github.io/normalization/spec/index.html#dfn-urgna2012)
18
+ * [URDNA2014](http://json-ld.github.io/normalization/spec/index.html#dfn-urdna2015)
19
+
20
+ Install with `gem install rdf-normalize`
21
+
22
+ * 100% free and unencumbered [public domain](http://unlicense.org/) software.
23
+ * Compatible with Ruby >= 1.9.3.
24
+
25
+ ## Usage
26
+
27
+ ## Documentation
28
+ Full documentation available on [Rubydoc.info][Normalize doc]
29
+
30
+ ### Principle Classes
31
+ * {RDF::Normalize}
32
+ * {RDF::Normalize::Base}
33
+ * {RDF::Normalize::Format}
34
+ * {RDF::Normalize::Writer}
35
+ * {RDF::Normalize::URGNA2012}
36
+ * {RDF::Normalize::URDNA2015}
37
+
38
+
39
+ ## Dependencies
40
+
41
+ * [Ruby](http://ruby-lang.org/) (>= 1.9.2)
42
+ * [RDF.rb](http://rubygems.org/gems/rdf) (~> 1.1)
43
+
44
+ ## Installation
45
+
46
+ The recommended installation method is via [RubyGems](http://rubygems.org/).
47
+ To install the latest official release of the `RDF::Normalize` gem, do:
48
+
49
+ % [sudo] gem install rdf-normalize
50
+
51
+ ## Mailing List
52
+ * <http://lists.w3.org/Archives/Public/public-rdf-ruby/>
53
+
54
+ ## Author
55
+ * [Gregg Kellogg](http://github.com/gkellogg) - <http://greggkellogg.net/>
56
+
57
+ ## Contributing
58
+ * Do your best to adhere to the existing coding conventions and idioms.
59
+ * Don't use hard tabs, and don't leave trailing whitespace on any line.
60
+ * Do document every method you add using [YARD][] annotations. Read the
61
+ [tutorial][YARD-GS] or just look at the existing code for examples.
62
+ * Don't touch the `.gemspec`, `VERSION` or `AUTHORS` files. If you need to
63
+ change them, do so on your private branch only.
64
+ * Do feel free to add yourself to the `CREDITS` file and the corresponding
65
+ list in the the `README`. Alphabetical order applies.
66
+ * Do note that in order for us to merge any non-trivial changes (as a rule
67
+ of thumb, additions larger than about 15 lines of code), we need an
68
+ explicit [public domain dedication][PDD] on record from you.
69
+
70
+ ## License
71
+ This is free and unencumbered public domain software. For more information,
72
+ see <http://unlicense.org/> or the accompanying {file:LICENSE} file.
73
+
74
+ [Ruby]: http://ruby-lang.org/
75
+ [RDF]: http://www.w3.org/RDF/
76
+ [YARD]: http://yardoc.org/
77
+ [YARD-GS]: http://rubydoc.info/docs/yard/file/docs/GettingStarted.md
78
+ [PDD]: http://lists.w3.org/Archives/Public/public-rdf-ruby/2010May/0013.html
79
+ [RDF.rb]: http://rubydoc.info/github/ruby-rdf/rdf-normalize
80
+ [N-Triples]: http://www.w3.org/TR/rdf-testcases/#ntriples
81
+ [RDF Normalize]:http://json-ld.github.io/normalization/spec/
82
+ [Normalize doc]:http://rubydoc.info/github/ruby-rdf/rdf-normalize/master/file/README.markdown
data/VERSION ADDED
@@ -0,0 +1 @@
1
+ 0.1.0
@@ -0,0 +1,66 @@
1
+ require 'rdf'
2
+
3
+ module RDF
4
+ ##
5
+ # **`RDF::Normalize`** is an RDF Graph normalization plugin for RDF.rb.
6
+ #
7
+ # @example Requiring the `RDF::Normalize` module
8
+ # require 'rdf/normalize'
9
+ #
10
+ # @example Returning an iterator for normalized statements
11
+ #
12
+ # g = RDF::Graph.load("etc/doap.ttl")
13
+ # RDF::Normalize.new(g).each_statement do |statement
14
+ # puts statement.inspect
15
+ # end
16
+ #
17
+ # @example Returning normalized N-Quads
18
+ #
19
+ # g = RDF::Graph.load("etc/doap.ttl")
20
+ # g.dump(:normalize)
21
+ #
22
+ # @example Writing a repository as normalized N-Quads
23
+ #
24
+ # RDF::Normalize::Writer.open("etc/doap.nq") do |writer|
25
+ # writer << RDF::Repository.load("etc/doap.ttl")
26
+ # end
27
+ #
28
+ # @author [Gregg Kellogg](http://greggkellogg.net/)
29
+ module Normalize
30
+ require 'rdf/normalize/format'
31
+ require 'rdf/normalize/utils'
32
+ autoload :Base, 'rdf/normalize/base'
33
+ autoload :Carroll2001,'rdf/normalize/carroll2001'
34
+ autoload :URGNA2012, 'rdf/normalize/urgna2012'
35
+ autoload :URDNA2015, 'rdf/normalize/urdna2015'
36
+ autoload :VERSION, 'rdf/normalize/version'
37
+ autoload :Writer, 'rdf/normalize/writer'
38
+
39
+ # Enumerable to normalize
40
+ # @return [RDF::Enumerable]
41
+ attr_accessor :dataset
42
+
43
+ ALGORITHMS = {
44
+ carroll2001: :Carroll2001,
45
+ urgna2012: :URGNA2012,
46
+ urdna2015: :URDNA2015
47
+ }.freeze
48
+
49
+ ##
50
+ # Creates a new normalizer instance using either the specified or default normalizer algorithm
51
+ # @param [RDF::Enumerable] enumerable
52
+ # @param [Hash{Symbol => Object}] options
53
+ # @option options [Base] :algorithm (:urdna2015)
54
+ # One of `:carroll2001`, `:urgna2012`, or `:urdna2015`
55
+ # @return [RDF::Normalize::Base]
56
+ # @raise [ArgumentError] selected algorithm not defined
57
+ def new(enumerable, options = {})
58
+ algorithm = options.fetch(:algorithm, :urdna2015)
59
+ raise ArgumentError, "No algoritm defined for #{algorithm.to_sym}" unless ALGORITHMS.has_key?(algorithm)
60
+ algorithm_class = const_get(ALGORITHMS[algorithm])
61
+ algorithm_class.new(enumerable, options)
62
+ end
63
+ module_function :new
64
+
65
+ end
66
+ end
@@ -0,0 +1,15 @@
1
+ module RDF::Normalize
2
+ ##
3
+ # Abstract class for pluggable normalization algorithms. Delegates to a default or selected algorithm if instantiated
4
+ module Base
5
+ attr_reader :dataset
6
+
7
+ # Enumerates normalized statements
8
+ #
9
+ # @yield statement
10
+ # @yieldparam [RDF::Statement] statement
11
+ def each(&block)
12
+ raise "Not Implemented"
13
+ end
14
+ end
15
+ end
@@ -0,0 +1,166 @@
1
+ module RDF::Normalize
2
+ class Carroll2001
3
+ include RDF::Enumerable
4
+ include Base
5
+ include Utils
6
+
7
+ ##
8
+ # Create an enumerable with grounded nodes
9
+ #
10
+ # @param [RDF::Enumerable] enumerable
11
+ # @return [RDF::Enumerable]
12
+ def initialize(enumerable, options)
13
+ @dataset = enumerable
14
+ end
15
+
16
+ def each(&block)
17
+ ground_statements, anon_statements = [], []
18
+ dataset.each_statement do |statement|
19
+ (statement.has_blank_nodes? ? anon_statements : ground_statements) << statement
20
+ end
21
+
22
+ nodes = anon_statements.map(&:to_quad).flatten.compact.select(&:node?).uniq
23
+
24
+ # Create a hash signature of every node, based on the signature of
25
+ # statements it exists in.
26
+ # We also save hashes of nodes that cannot be reliably known; we will use
27
+ # that information to eliminate possible recursion combinations.
28
+ #
29
+ # Any mappings given in the method parameters are considered grounded.
30
+ hashes, ungrounded_hashes = hash_nodes(anon_statements, nodes, {})
31
+
32
+ # FIXME: likely need to iterate until hashes and ungrounded_hashes are the same size
33
+ while hashes.size != ungrounded_hashes.size
34
+ raise "Not done"
35
+ end
36
+
37
+ # Enumerate all statements, replacing nodes with new ground nodes using the hash as an identifier
38
+ ground_statements.each(&block)
39
+ anon_statements.each do |statement|
40
+ quad = statement.to_quad.compact.map do |term|
41
+ term.node? ? RDF::Node.intern(hashes[term]) : term
42
+ end
43
+ block.call RDF::Statement.from(quad)
44
+ end
45
+ end
46
+
47
+ private
48
+
49
+ # Given a set of statements, create a mapping of node => SHA1 for a given
50
+ # set of blank nodes.
51
+ #
52
+ # Returns a tuple of hashes: one of grounded hashes, and one of all
53
+ # hashes. grounded hashes are based on non-blank nodes and grounded blank
54
+ # nodes, and can be used to determine if a node's signature matches
55
+ # another.
56
+ #
57
+ # @param [Array] statements
58
+ # @param [Array] nodes
59
+ # @param [Hash] grounded_hashes
60
+ # mapping of node => SHA1 pairs as input, used to create more specific signatures of other nodes.
61
+ # @private
62
+ # @return [Hash, Hash]
63
+ def hash_nodes(statements, nodes, grounded_hashes)
64
+ hashes = grounded_hashes.dup
65
+ ungrounded_hashes = {}
66
+ hash_needed = true
67
+
68
+ # We may have to go over the list multiple times. If a node is marked as
69
+ # grounded, other nodes can then use it to decide their own state of
70
+ # grounded.
71
+ while hash_needed
72
+ starting_grounded_nodes = hashes.size
73
+ nodes.each do | node |
74
+ unless hashes.member? node
75
+ grounded, hash = node_hash_for(node, statements, hashes)
76
+ if grounded
77
+ hashes[node] = hash
78
+ end
79
+ ungrounded_hashes[node] = hash
80
+ end
81
+ end
82
+
83
+ # after going over the list, any nodes with a unique hash can be marked
84
+ # as grounded, even if we have not tied them back to a root yet.
85
+ uniques = {}
86
+ ungrounded_hashes.each do |node, hash|
87
+ uniques[hash] = uniques.has_key?(hash) ? false : node
88
+ end
89
+ uniques.each do |hash, node|
90
+ hashes[node] = hash if node
91
+ end
92
+ hash_needed = starting_grounded_nodes != hashes.size
93
+ end
94
+ [hashes, ungrounded_hashes]
95
+ end
96
+
97
+ # Generate a hash for a node based on the signature of the statements it
98
+ # appears in. Signatures consist of grounded elements in statements
99
+ # associated with a node, that is, anything but an ungrounded anonymous
100
+ # node. Creating the hash is simply hashing a sorted list of each
101
+ # statement's signature, which is itself a concatenation of the string form
102
+ # of all grounded elements.
103
+ #
104
+ # Nodes other than the given node are considered grounded if they are a
105
+ # member in the given hash.
106
+ #
107
+ # @param [RDF::Node] node
108
+ # @param [Array<RDF::Statement>] statements
109
+ # @param [Hash] hashes
110
+ # @return [Boolean, String]
111
+ # a tuple consisting of grounded being true or false and the String for the hash
112
+ def node_hash_for(node, statements, hashes)
113
+ statement_signatures = []
114
+ grounded = true
115
+ statements.each do | statement |
116
+ if statement.to_quad.include?(node)
117
+ statement_signatures << hash_string_for(statement, hashes, node)
118
+ statement.to_quad.compact.each do | resource |
119
+ grounded = false unless grounded?(resource, hashes) || resource == node
120
+ end
121
+ end
122
+ end
123
+ # Note that we sort the signatures--without a canonical ordering,
124
+ # we might get different hashes for equivalent nodes.
125
+ [grounded,Digest::SHA1.hexdigest(statement_signatures.sort.to_s)]
126
+ end
127
+
128
+ # Provide a string signature for the given statement, collecting
129
+ # string signatures for grounded node elements.
130
+ # @return [String]
131
+ def hash_string_for(statement, hashes, node)
132
+ statement.to_quad.map {|r| string_for_node(r, hashes, node)}.join("")
133
+ end
134
+
135
+ # Returns true if a given node is grounded
136
+ # A node is groundd if it is not a blank node or it is included
137
+ # in the given mapping of grounded nodes.
138
+ # @return [Boolean]
139
+ def grounded?(node, hashes)
140
+ (!(node.node?)) || (hashes.member? node)
141
+ end
142
+
143
+ # Provides a string for the given node for use in a string signature
144
+ # Non-anonymous nodes will return their string form. Grounded anonymous
145
+ # nodes will return their hashed form.
146
+ # @return [String]
147
+ def string_for_node(node, hashes, target)
148
+ case
149
+ when node.nil?
150
+ ""
151
+ when node == target
152
+ "itself"
153
+ when node.node? && hashes.member?(node)
154
+ hashes[node]
155
+ when node.node?
156
+ "a blank node"
157
+ # RDF.rb auto-boxing magic makes some literals the same when they
158
+ # should not be; the ntriples serializer will take care of us
159
+ when node.literal?
160
+ node.class.name + RDF::NTriples.serialize(node)
161
+ else
162
+ node.to_s
163
+ end
164
+ end
165
+ end
166
+ end
@@ -0,0 +1,11 @@
1
+ require 'rdf/nquads'
2
+
3
+ module RDF::Normalize
4
+ class Format < RDF::Format
5
+ content_encoding 'utf-8'
6
+
7
+ # It reads like normal N-Quads
8
+ reader { RDF::NQuads::Reader}
9
+ writer { RDF::Normalize::Writer }
10
+ end
11
+ end
@@ -0,0 +1,264 @@
1
+ module RDF::Normalize
2
+ class URDNA2015
3
+ include RDF::Enumerable
4
+ include Base
5
+ include Utils
6
+
7
+ ##
8
+ # Create an enumerable with grounded nodes
9
+ #
10
+ # @param [RDF::Enumerable] enumerable
11
+ # @return [RDF::Enumerable]
12
+ def initialize(enumerable, options)
13
+ @dataset, @options = enumerable, options
14
+ end
15
+
16
+ def each(&block)
17
+ ns = NormalizationState.new(@options)
18
+ normalize_statements(ns, &block)
19
+ end
20
+
21
+ protected
22
+ def normalize_statements(ns, &block)
23
+ # Map BNodes to the statements they are used by
24
+ dataset.each_statement do |statement|
25
+ statement.to_quad.compact.select(&:node?).each do |node|
26
+ ns.add_statement(node, statement)
27
+ end
28
+ end
29
+
30
+ non_normalized_identifiers, simple = ns.bnode_to_statements.keys, true
31
+
32
+ while simple
33
+ simple = false
34
+ ns.hash_to_bnodes = {}
35
+
36
+ # Calculate hashes for first degree nodes
37
+ non_normalized_identifiers.each do |node|
38
+ hash = depth {ns.hash_first_degree_quads(node)}
39
+ debug("1deg") {"hash: #{hash}"}
40
+ ns.add_bnode_hash(node, hash)
41
+ end
42
+
43
+ # Create canonical replacements for hashes mapping to a single node
44
+ ns.hash_to_bnodes.keys.sort.each do |hash|
45
+ identifier_list = ns.hash_to_bnodes[hash]
46
+ next if identifier_list.length > 1
47
+ node = identifier_list.first
48
+ id = ns.canonical_issuer.issue_identifier(node)
49
+ debug("single node") {"node: #{node.to_ntriples}, hash: #{hash}, id: #{id}"}
50
+ non_normalized_identifiers -= identifier_list
51
+ ns.hash_to_bnodes.delete(hash)
52
+ simple = true
53
+ end
54
+ end
55
+
56
+ # Iterate over hashs having more than one node
57
+ ns.hash_to_bnodes.keys.sort.each do |hash|
58
+ identifier_list = ns.hash_to_bnodes[hash]
59
+
60
+ debug("multiple nodes") {"node: #{identifier_list.map(&:to_ntriples).join(",")}, hash: #{hash}"}
61
+ hash_path_list = []
62
+
63
+ # Create a hash_path_list for all bnodes using a temporary identifier used to create canonical replacements
64
+ identifier_list.each do |identifier|
65
+ next if ns.canonical_issuer.issued.include?(identifier)
66
+ temporary_issuer = IdentifierIssuer.new("_:b")
67
+ temporary_issuer.issue_identifier(identifier)
68
+ hash_path_list << depth {ns.hash_n_degree_quads(identifier, temporary_issuer)}
69
+ end
70
+ debug("->") {"hash_path_list: #{hash_path_list.map(&:first).inspect}"}
71
+
72
+ # Create canonical replacements for nodes
73
+ hash_path_list.sort_by(&:first).map(&:last).each do |issuer|
74
+ issuer.issued.each do |node|
75
+ id = ns.canonical_issuer.issue_identifier(node)
76
+ debug("-->") {"node: #{node.to_ntriples}, id: #{id}"}
77
+ end
78
+ end
79
+ end
80
+
81
+ # Yield statements using BNodes from canonical replacements
82
+ dataset.each_statement do |statement|
83
+ if statement.has_blank_nodes?
84
+ quad = statement.to_quad.compact.map do |term|
85
+ term.node? ? RDF::Node.intern(ns.canonical_issuer.identifier(term)[2..-1]) : term
86
+ end
87
+ block.call RDF::Statement.from(quad)
88
+ else
89
+ block.call statement
90
+ end
91
+ end
92
+ end
93
+
94
+ private
95
+
96
+ class NormalizationState
97
+ include Utils
98
+
99
+ attr_accessor :bnode_to_statements
100
+ attr_accessor :hash_to_bnodes
101
+ attr_accessor :canonical_issuer
102
+
103
+ def initialize(options)
104
+ @options = options
105
+ @bnode_to_statements, @hash_to_bnodes, @canonical_issuer = {}, {}, IdentifierIssuer.new("_:c14n")
106
+ end
107
+
108
+ def add_statement(node, statement)
109
+ bnode_to_statements[node] ||= []
110
+ bnode_to_statements[node] << statement unless bnode_to_statements[node].include?(statement)
111
+ end
112
+
113
+ def add_bnode_hash(node, hash)
114
+ hash_to_bnodes[hash] ||= []
115
+ hash_to_bnodes[hash] << node unless hash_to_bnodes[hash].include?(node)
116
+ end
117
+
118
+ # @param [RDF::Node] node
119
+ # @return [String] the SHA1 hexdigest hash of statements using this node, with replacements
120
+ def hash_first_degree_quads(node)
121
+ quads = bnode_to_statements[node].
122
+ map do |statement|
123
+ quad = statement.to_quad.map do |t|
124
+ case t
125
+ when node then RDF::Node("a")
126
+ when RDF::Node then RDF::Node("z")
127
+ else t
128
+ end
129
+ end
130
+ RDF::NQuads::Writer.serialize(RDF::Statement.from(quad))
131
+ end
132
+
133
+ debug("1deg") {"node: #{node}, quads: #{quads}"}
134
+ hexdigest(quads.sort.join)
135
+ end
136
+
137
+ # @param [RDF::Node] related
138
+ # @param [RDF::Statement] statement
139
+ # @param [IdentifierIssuer] issuer
140
+ # @param [String] position one of :s, :o, or :g
141
+ # @return [String] the SHA1 hexdigest hash
142
+ def hash_related_node(related, statement, issuer, position)
143
+ identifier = canonical_issuer.identifier(related) ||
144
+ issuer.identifier(related) ||
145
+ hash_first_degree_quads(related)
146
+ input = position.to_s
147
+ input << statement.predicate.to_ntriples unless position == :g
148
+ input << identifier
149
+ debug("hrel") {"input: #{input.inspect}, hash: #{hexdigest(input)}"}
150
+ hexdigest(input)
151
+ end
152
+
153
+ # @param [RDF::Node] identifier
154
+ # @param [IdentifierIssuer] issuer
155
+ # @return [Array<String,IdentifierIssuer>] the Hash and issuer
156
+ def hash_n_degree_quads(identifier, issuer)
157
+ debug("ndeg") {"identifier: #{identifier.to_ntriples}"}
158
+
159
+ # hash to related blank nodes map
160
+ map = {}
161
+
162
+ bnode_to_statements[identifier].each do |statement|
163
+ hash_related_statement(identifier, statement, issuer, map)
164
+ end
165
+
166
+ data_to_hash = ""
167
+
168
+ debug("ndeg") {"map: #{map.map {|h,l| "#{h}: #{l.map(&:to_ntriples)}"}.join('; ')}"}
169
+ depth do
170
+ map.keys.sort.each do |hash|
171
+ list = map[hash]
172
+ # Iterate over related nodes
173
+ chosen_path, chosen_issuer = "", nil
174
+ data_to_hash += hash
175
+
176
+ list.permutation do |permutation|
177
+ debug("ndeg") {"perm: #{permutation.map(&:to_ntriples).join(",")}"}
178
+ issuer_copy, path, recursion_list = issuer.dup, "", []
179
+
180
+ permutation.each do |related|
181
+ if canonical_issuer.identifier(related)
182
+ path << canonical_issuer.issue_identifier(related)
183
+ else
184
+ recursion_list << related if !issuer_copy.identifier(related)
185
+ path << issuer_copy.issue_identifier(related)
186
+ end
187
+
188
+ # Skip to the next permutation if chosen path isn't empty and the path is greater than the chosen path
189
+ break if !chosen_path.empty? && path.length >= chosen_path.length
190
+ end
191
+ debug("ndeg") {"hash: #{hash}, path: #{path}, recursion: #{recursion_list.map(&:to_ntriples)}"}
192
+
193
+ recursion_list.each do |related|
194
+ result = depth {hash_n_degree_quads(related, issuer_copy)}
195
+ path << issuer_copy.issue_identifier(related)
196
+ path << "<#{result.first}>"
197
+ issuer_copy = result.last
198
+ break if !chosen_path.empty? && path.length >= chosen_path.length && path > chosen_path
199
+ end
200
+
201
+ if chosen_path.empty? || path < chosen_path
202
+ chosen_path, chosen_issuer = path, issuer_copy
203
+ end
204
+ end
205
+
206
+ data_to_hash += chosen_path
207
+ issuer = chosen_issuer
208
+ end
209
+ end
210
+
211
+ debug("ndeg") {"datatohash: #{data_to_hash.inspect}, hash: #{hexdigest(data_to_hash)}"}
212
+ return [hexdigest(data_to_hash), issuer]
213
+ end
214
+
215
+ protected
216
+
217
+ # FIXME: should be SHA-256.
218
+ def hexdigest(val)
219
+ Digest::SHA1.hexdigest(val)
220
+ end
221
+
222
+ # Group adjacent bnodes by hash
223
+ def hash_related_statement(identifier, statement, issuer, map)
224
+ statement.to_hash(:s, :p, :o, :g).each do |pos, term|
225
+ next if !term.is_a?(RDF::Node) || term == identifier
226
+
227
+ hash = depth {hash_related_node(term, statement, issuer, pos)}
228
+ map[hash] ||= []
229
+ map[hash] << term unless map[hash].include?(term)
230
+ end
231
+ end
232
+ end
233
+
234
+ class IdentifierIssuer
235
+ def initialize(prefix = "_:c14n")
236
+ @prefix, @counter, @issued = prefix, 0, {}
237
+ end
238
+
239
+ # Return an identifier for this BNode
240
+ def issue_identifier(node)
241
+ @issued[node] ||= begin
242
+ res, @counter = @prefix + @counter.to_s, @counter + 1
243
+ res
244
+ end
245
+ end
246
+
247
+ def issued
248
+ @issued.keys
249
+ end
250
+
251
+ def identifier(node)
252
+ @issued[node]
253
+ end
254
+
255
+ # Duplicate this issuer, ensuring that the issued identifiers remain distinct
256
+ # @return [IdentifierIssuer]
257
+ def dup
258
+ other = super
259
+ other.instance_variable_set(:@issued, @issued.dup)
260
+ other
261
+ end
262
+ end
263
+ end
264
+ end
@@ -0,0 +1,47 @@
1
+ module RDF::Normalize
2
+ class URGNA2012 < URDNA2015
3
+
4
+ def each(&block)
5
+ ns = NormalizationState.new(@options)
6
+ normalize_statements(ns, &block)
7
+ end
8
+
9
+ class NormalizationState < URDNA2015::NormalizationState
10
+ protected
11
+
12
+ # 2012 version uses SHA-1
13
+ def hexdigest(val)
14
+ Digest::SHA1.hexdigest(val)
15
+ end
16
+
17
+ # @param [RDF::Node] related
18
+ # @param [RDF::Statement] statement
19
+ # @param [IdentifierIssuer] issuer
20
+ # @param [String] position one of :s, :o, or :g
21
+ # @return [String] the SHA1 hexdigest hash
22
+ def hash_related_node(related, statement, issuer, position)
23
+ identifier = canonical_issuer.identifier(related) ||
24
+ issuer.identifier(related) ||
25
+ hash_first_degree_quads(related)
26
+ input = position.to_s
27
+ input << statement.predicate.to_s
28
+ input << identifier
29
+ debug("hrel") {"input: #{input.inspect}, hash: #{hexdigest(input)}"}
30
+ hexdigest(input)
31
+ end
32
+
33
+ # In URGNA2012, the position parameter passed to the Hash Related Blank Node algorithm was instead modeled as a direction parameter, where it could have the value p, for property, when the related blank node was a `subject` and the value r, for reverse or reference, when the related blank node was an `object`. Since URGNA2012 only normalized graphs, not datasets, there was no use of the `graph` position.
34
+ def hash_related_statement(identifier, statement, issuer, map)
35
+ if statement.subject.node? && statement.subject != identifier
36
+ hash = depth {hash_related_node(statement.subject, statement, issuer, :p)}
37
+ map[hash] ||= []
38
+ map[hash] << statement.subject unless map[hash].include?(statement.subject)
39
+ elsif statement.object.node? && statement.object != identifier
40
+ hash = depth {hash_related_node(statement.object, statement, issuer, :r)}
41
+ map[hash] ||= []
42
+ map[hash] << statement.object unless map[hash].include?(statement.object)
43
+ end
44
+ end
45
+ end
46
+ end
47
+ end
@@ -0,0 +1,33 @@
1
+ module RDF::Normalize
2
+ module Utils
3
+ # Add debug event to debug array, if specified
4
+ #
5
+ # param [String] message
6
+ # yieldreturn [String] appended to message, to allow for lazy-evaulation of message
7
+ def debug(*args)
8
+ options = args.last.is_a?(Hash) ? args.pop : {}
9
+ return unless options[:debug] || @options[:debug]
10
+ depth = options[:depth] || @options[:depth]
11
+ d_str = depth > 100 ? ' ' * 100 + '+' : ' ' * depth
12
+ list = args
13
+ list << yield if block_given?
14
+ message = d_str + (list.empty? ? "" : list.join(": "))
15
+ options[:debug] << message if options[:debug].is_a?(Array)
16
+ @options[:debug] << message if @options[:debug].is_a?(Array)
17
+ $stderr.puts(message) if @options[:debug] == TrueClass
18
+ end
19
+ module_function :debug
20
+
21
+ # Increase depth around a method invocation
22
+ # @yield
23
+ # Yields with no arguments
24
+ # @yieldreturn [Object] returns the result of yielding
25
+ # @return [Object]
26
+ def depth
27
+ @options[:depth] += 1
28
+ ret = yield
29
+ @options[:depth] -= 1
30
+ ret
31
+ end
32
+ end
33
+ end
@@ -0,0 +1,79 @@
1
+ module RDF::Normalize
2
+ ##
3
+ # A RDF Graph normalization serialiser.
4
+ #
5
+ # Normalizes the enumerated statements into normal form in the form of N-Quads.
6
+ #
7
+ # @author [Gregg Kellogg](http://kellogg-assoc.com/)
8
+ class Writer < RDF::NQuads::Writer
9
+ format RDF::Normalize::Format
10
+
11
+ # @attr_accessor [RDF::Repository] Repository of statements to serialized
12
+ attr_accessor :repo
13
+
14
+ ##
15
+ # Initializes the writer instance.
16
+ #
17
+ # @param [IO, File] output
18
+ # the output stream
19
+ # @param [Hash{Symbol => Object}] options
20
+ # any additional options
21
+ # @yield [writer] `self`
22
+ # @yieldparam [RDF::Writer] writer
23
+ # @yieldreturn [void]
24
+ # @yield [writer]
25
+ # @yieldparam [RDF::Writer] writer
26
+ def initialize(output = $stdout, options = {}, &block)
27
+ super do
28
+ @options[:depth] ||= 0
29
+ @repo = RDF::Repository.new
30
+ if block_given?
31
+ case block.arity
32
+ when 0 then instance_eval(&block)
33
+ else block.call(self)
34
+ end
35
+ end
36
+ end
37
+ end
38
+
39
+ ##
40
+ # Defer writing to epilogue
41
+ def write_statement(statement)
42
+ self
43
+ end
44
+
45
+ ##
46
+ # Outputs the Graph representation of all stored triples.
47
+ #
48
+ # @return [void]
49
+ def write_epilogue
50
+ statements = RDF::Normalize.new(@repo, @options).
51
+ statements.
52
+ reject(&:variable?).
53
+ map {|s| format_statement(s)}.
54
+ sort.
55
+ each do |line|
56
+ puts line
57
+ end
58
+ end
59
+
60
+ protected
61
+
62
+ ##
63
+ # Adds a statement to be serialized
64
+ # @param [RDF::Statement] statement
65
+ # @return [void]
66
+ def insert_statement(statement)
67
+ @repo.insert(statement)
68
+ end
69
+
70
+ ##
71
+ # Insert an Enumerable
72
+ #
73
+ # @param [RDF::Enumerable] graph
74
+ # @return [void]
75
+ def insert_statements(enumerable)
76
+ @repo = enumerable
77
+ end
78
+ end
79
+ end
metadata ADDED
@@ -0,0 +1,160 @@
1
+ --- !ruby/object:Gem::Specification
2
+ name: rdf-normalize
3
+ version: !ruby/object:Gem::Version
4
+ version: 0.1.0
5
+ platform: ruby
6
+ authors:
7
+ - Gregg Kellogg
8
+ autorequire:
9
+ bindir: bin
10
+ cert_chain: []
11
+ date: 2015-05-20 00:00:00.000000000 Z
12
+ dependencies:
13
+ - !ruby/object:Gem::Dependency
14
+ name: rdf
15
+ requirement: !ruby/object:Gem::Requirement
16
+ requirements:
17
+ - - "~>"
18
+ - !ruby/object:Gem::Version
19
+ version: '1.1'
20
+ type: :runtime
21
+ prerelease: false
22
+ version_requirements: !ruby/object:Gem::Requirement
23
+ requirements:
24
+ - - "~>"
25
+ - !ruby/object:Gem::Version
26
+ version: '1.1'
27
+ - !ruby/object:Gem::Dependency
28
+ name: rdf-spec
29
+ requirement: !ruby/object:Gem::Requirement
30
+ requirements:
31
+ - - "~>"
32
+ - !ruby/object:Gem::Version
33
+ version: '1.1'
34
+ type: :development
35
+ prerelease: false
36
+ version_requirements: !ruby/object:Gem::Requirement
37
+ requirements:
38
+ - - "~>"
39
+ - !ruby/object:Gem::Version
40
+ version: '1.1'
41
+ - !ruby/object:Gem::Dependency
42
+ name: open-uri-cached
43
+ requirement: !ruby/object:Gem::Requirement
44
+ requirements:
45
+ - - "~>"
46
+ - !ruby/object:Gem::Version
47
+ version: '0.0'
48
+ - - ">="
49
+ - !ruby/object:Gem::Version
50
+ version: 0.0.5
51
+ type: :development
52
+ prerelease: false
53
+ version_requirements: !ruby/object:Gem::Requirement
54
+ requirements:
55
+ - - "~>"
56
+ - !ruby/object:Gem::Version
57
+ version: '0.0'
58
+ - - ">="
59
+ - !ruby/object:Gem::Version
60
+ version: 0.0.5
61
+ - !ruby/object:Gem::Dependency
62
+ name: rspec
63
+ requirement: !ruby/object:Gem::Requirement
64
+ requirements:
65
+ - - "~>"
66
+ - !ruby/object:Gem::Version
67
+ version: '3.2'
68
+ type: :development
69
+ prerelease: false
70
+ version_requirements: !ruby/object:Gem::Requirement
71
+ requirements:
72
+ - - "~>"
73
+ - !ruby/object:Gem::Version
74
+ version: '3.2'
75
+ - !ruby/object:Gem::Dependency
76
+ name: webmock
77
+ requirement: !ruby/object:Gem::Requirement
78
+ requirements:
79
+ - - "~>"
80
+ - !ruby/object:Gem::Version
81
+ version: '1.17'
82
+ type: :development
83
+ prerelease: false
84
+ version_requirements: !ruby/object:Gem::Requirement
85
+ requirements:
86
+ - - "~>"
87
+ - !ruby/object:Gem::Version
88
+ version: '1.17'
89
+ - !ruby/object:Gem::Dependency
90
+ name: json-ld
91
+ requirement: !ruby/object:Gem::Requirement
92
+ requirements:
93
+ - - "~>"
94
+ - !ruby/object:Gem::Version
95
+ version: '1.1'
96
+ type: :development
97
+ prerelease: false
98
+ version_requirements: !ruby/object:Gem::Requirement
99
+ requirements:
100
+ - - "~>"
101
+ - !ruby/object:Gem::Version
102
+ version: '1.1'
103
+ - !ruby/object:Gem::Dependency
104
+ name: yard
105
+ requirement: !ruby/object:Gem::Requirement
106
+ requirements:
107
+ - - "~>"
108
+ - !ruby/object:Gem::Version
109
+ version: '0.8'
110
+ type: :development
111
+ prerelease: false
112
+ version_requirements: !ruby/object:Gem::Requirement
113
+ requirements:
114
+ - - "~>"
115
+ - !ruby/object:Gem::Version
116
+ version: '0.8'
117
+ description: RDF::Normalize is a Graph normalizer for the RDF.rb library suite.
118
+ email: public-rdf-ruby@w3.org
119
+ executables: []
120
+ extensions: []
121
+ extra_rdoc_files: []
122
+ files:
123
+ - AUTHORS
124
+ - LICENSE
125
+ - README.md
126
+ - VERSION
127
+ - lib/rdf/normalize.rb
128
+ - lib/rdf/normalize/base.rb
129
+ - lib/rdf/normalize/carroll2001.rb
130
+ - lib/rdf/normalize/format.rb
131
+ - lib/rdf/normalize/urdna2015.rb
132
+ - lib/rdf/normalize/urgna2012.rb
133
+ - lib/rdf/normalize/utils.rb
134
+ - lib/rdf/normalize/writer.rb
135
+ homepage: http://github.com/gkellogg/rdf-normalize
136
+ licenses:
137
+ - Public Domain
138
+ metadata: {}
139
+ post_install_message:
140
+ rdoc_options: []
141
+ require_paths:
142
+ - lib
143
+ required_ruby_version: !ruby/object:Gem::Requirement
144
+ requirements:
145
+ - - ">="
146
+ - !ruby/object:Gem::Version
147
+ version: 1.9.2
148
+ required_rubygems_version: !ruby/object:Gem::Requirement
149
+ requirements:
150
+ - - ">="
151
+ - !ruby/object:Gem::Version
152
+ version: '0'
153
+ requirements: []
154
+ rubyforge_project: rdf-normalize
155
+ rubygems_version: 2.4.7
156
+ signing_key:
157
+ specification_version: 4
158
+ summary: RDF Graph normalizer for Ruby.
159
+ test_files: []
160
+ has_rdoc: false