rdf-normalize 0.1.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- checksums.yaml +7 -0
- data/AUTHORS +1 -0
- data/LICENSE +25 -0
- data/README.md +82 -0
- data/VERSION +1 -0
- data/lib/rdf/normalize.rb +66 -0
- data/lib/rdf/normalize/base.rb +15 -0
- data/lib/rdf/normalize/carroll2001.rb +166 -0
- data/lib/rdf/normalize/format.rb +11 -0
- data/lib/rdf/normalize/urdna2015.rb +264 -0
- data/lib/rdf/normalize/urgna2012.rb +47 -0
- data/lib/rdf/normalize/utils.rb +33 -0
- data/lib/rdf/normalize/writer.rb +79 -0
- metadata +160 -0
checksums.yaml
ADDED
@@ -0,0 +1,7 @@
|
|
1
|
+
---
|
2
|
+
SHA1:
|
3
|
+
metadata.gz: d7076dcfeccdbfc0b35ec046d0b338a6ad41d776
|
4
|
+
data.tar.gz: cd5f278797b575a3a6cced04890b9014c2350f42
|
5
|
+
SHA512:
|
6
|
+
metadata.gz: 2510cec72f19af6eef55678382f688a5948b59fad4c6be18465c53a66d16b25bb543dadbd5a5148675d291afac133c8cc7d5399650ad9043b858c1a1f6165291
|
7
|
+
data.tar.gz: f785bd00b4abacf7da181daae96e2101aa449b19791a08742654451cf9a3b25abdf368dd77d99a86c4f74a49084b2d9af6464ea30bbed45589f87713e899ee63
|
data/AUTHORS
ADDED
@@ -0,0 +1 @@
|
|
1
|
+
* Gregg Kellogg <gregg@greggkellogg.net>
|
data/LICENSE
ADDED
@@ -0,0 +1,25 @@
|
|
1
|
+
This is free and unencumbered software released into the public domain.
|
2
|
+
|
3
|
+
Anyone is free to copy, modify, publish, use, compile, sell, or
|
4
|
+
distribute this software, either in source code form or as a compiled
|
5
|
+
binary, for any purpose, commercial or non-commercial, and by any
|
6
|
+
means.
|
7
|
+
|
8
|
+
In jurisdictions that recognize copyright laws, the author or authors
|
9
|
+
of this software dedicate any and all copyright interest in the
|
10
|
+
software to the public domain. We make this dedication for the benefit
|
11
|
+
of the public at large and to the detriment of our heirs and
|
12
|
+
successors. We intend this dedication to be an overt act of
|
13
|
+
relinquishment in perpetuity of all present and future rights to this
|
14
|
+
software under copyright law.
|
15
|
+
|
16
|
+
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
|
17
|
+
EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
|
18
|
+
MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.
|
19
|
+
IN NO EVENT SHALL THE AUTHORS BE LIABLE FOR ANY CLAIM, DAMAGES OR
|
20
|
+
OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
|
21
|
+
ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
|
22
|
+
OTHER DEALINGS IN THE SOFTWARE.
|
23
|
+
|
24
|
+
For more information, please refer to <http://unlicense.org>
|
25
|
+
|
data/README.md
ADDED
@@ -0,0 +1,82 @@
|
|
1
|
+
# RDF::Normalize
|
2
|
+
RDF Graph normalizer for [RDF.rb][RDF.rb].
|
3
|
+
|
4
|
+
[](http://badge.fury.io/rb/rdf-normalize)
|
5
|
+
[](http://travis-ci.org/ruby-rdf/rdf-normalize)
|
6
|
+
|
7
|
+
## Description
|
8
|
+
This is a [Ruby][] implementation of a [RDF Normalize][] for [RDF.rb][].
|
9
|
+
|
10
|
+
## Features
|
11
|
+
RDF::Normalize generates normalized [N-Quads][] output for an RDF Dataset using the algorithm
|
12
|
+
defined in [RDF Normalize][]. It also implements an RDF Writer interface, which can be used
|
13
|
+
to serialize normalized statements.
|
14
|
+
|
15
|
+
Algorithms implemented:
|
16
|
+
|
17
|
+
* [URGNA2012](http://json-ld.github.io/normalization/spec/index.html#dfn-urgna2012)
|
18
|
+
* [URDNA2014](http://json-ld.github.io/normalization/spec/index.html#dfn-urdna2015)
|
19
|
+
|
20
|
+
Install with `gem install rdf-normalize`
|
21
|
+
|
22
|
+
* 100% free and unencumbered [public domain](http://unlicense.org/) software.
|
23
|
+
* Compatible with Ruby >= 1.9.3.
|
24
|
+
|
25
|
+
## Usage
|
26
|
+
|
27
|
+
## Documentation
|
28
|
+
Full documentation available on [Rubydoc.info][Normalize doc]
|
29
|
+
|
30
|
+
### Principle Classes
|
31
|
+
* {RDF::Normalize}
|
32
|
+
* {RDF::Normalize::Base}
|
33
|
+
* {RDF::Normalize::Format}
|
34
|
+
* {RDF::Normalize::Writer}
|
35
|
+
* {RDF::Normalize::URGNA2012}
|
36
|
+
* {RDF::Normalize::URDNA2015}
|
37
|
+
|
38
|
+
|
39
|
+
## Dependencies
|
40
|
+
|
41
|
+
* [Ruby](http://ruby-lang.org/) (>= 1.9.2)
|
42
|
+
* [RDF.rb](http://rubygems.org/gems/rdf) (~> 1.1)
|
43
|
+
|
44
|
+
## Installation
|
45
|
+
|
46
|
+
The recommended installation method is via [RubyGems](http://rubygems.org/).
|
47
|
+
To install the latest official release of the `RDF::Normalize` gem, do:
|
48
|
+
|
49
|
+
% [sudo] gem install rdf-normalize
|
50
|
+
|
51
|
+
## Mailing List
|
52
|
+
* <http://lists.w3.org/Archives/Public/public-rdf-ruby/>
|
53
|
+
|
54
|
+
## Author
|
55
|
+
* [Gregg Kellogg](http://github.com/gkellogg) - <http://greggkellogg.net/>
|
56
|
+
|
57
|
+
## Contributing
|
58
|
+
* Do your best to adhere to the existing coding conventions and idioms.
|
59
|
+
* Don't use hard tabs, and don't leave trailing whitespace on any line.
|
60
|
+
* Do document every method you add using [YARD][] annotations. Read the
|
61
|
+
[tutorial][YARD-GS] or just look at the existing code for examples.
|
62
|
+
* Don't touch the `.gemspec`, `VERSION` or `AUTHORS` files. If you need to
|
63
|
+
change them, do so on your private branch only.
|
64
|
+
* Do feel free to add yourself to the `CREDITS` file and the corresponding
|
65
|
+
list in the the `README`. Alphabetical order applies.
|
66
|
+
* Do note that in order for us to merge any non-trivial changes (as a rule
|
67
|
+
of thumb, additions larger than about 15 lines of code), we need an
|
68
|
+
explicit [public domain dedication][PDD] on record from you.
|
69
|
+
|
70
|
+
## License
|
71
|
+
This is free and unencumbered public domain software. For more information,
|
72
|
+
see <http://unlicense.org/> or the accompanying {file:LICENSE} file.
|
73
|
+
|
74
|
+
[Ruby]: http://ruby-lang.org/
|
75
|
+
[RDF]: http://www.w3.org/RDF/
|
76
|
+
[YARD]: http://yardoc.org/
|
77
|
+
[YARD-GS]: http://rubydoc.info/docs/yard/file/docs/GettingStarted.md
|
78
|
+
[PDD]: http://lists.w3.org/Archives/Public/public-rdf-ruby/2010May/0013.html
|
79
|
+
[RDF.rb]: http://rubydoc.info/github/ruby-rdf/rdf-normalize
|
80
|
+
[N-Triples]: http://www.w3.org/TR/rdf-testcases/#ntriples
|
81
|
+
[RDF Normalize]:http://json-ld.github.io/normalization/spec/
|
82
|
+
[Normalize doc]:http://rubydoc.info/github/ruby-rdf/rdf-normalize/master/file/README.markdown
|
data/VERSION
ADDED
@@ -0,0 +1 @@
|
|
1
|
+
0.1.0
|
@@ -0,0 +1,66 @@
|
|
1
|
+
require 'rdf'
|
2
|
+
|
3
|
+
module RDF
|
4
|
+
##
|
5
|
+
# **`RDF::Normalize`** is an RDF Graph normalization plugin for RDF.rb.
|
6
|
+
#
|
7
|
+
# @example Requiring the `RDF::Normalize` module
|
8
|
+
# require 'rdf/normalize'
|
9
|
+
#
|
10
|
+
# @example Returning an iterator for normalized statements
|
11
|
+
#
|
12
|
+
# g = RDF::Graph.load("etc/doap.ttl")
|
13
|
+
# RDF::Normalize.new(g).each_statement do |statement
|
14
|
+
# puts statement.inspect
|
15
|
+
# end
|
16
|
+
#
|
17
|
+
# @example Returning normalized N-Quads
|
18
|
+
#
|
19
|
+
# g = RDF::Graph.load("etc/doap.ttl")
|
20
|
+
# g.dump(:normalize)
|
21
|
+
#
|
22
|
+
# @example Writing a repository as normalized N-Quads
|
23
|
+
#
|
24
|
+
# RDF::Normalize::Writer.open("etc/doap.nq") do |writer|
|
25
|
+
# writer << RDF::Repository.load("etc/doap.ttl")
|
26
|
+
# end
|
27
|
+
#
|
28
|
+
# @author [Gregg Kellogg](http://greggkellogg.net/)
|
29
|
+
module Normalize
|
30
|
+
require 'rdf/normalize/format'
|
31
|
+
require 'rdf/normalize/utils'
|
32
|
+
autoload :Base, 'rdf/normalize/base'
|
33
|
+
autoload :Carroll2001,'rdf/normalize/carroll2001'
|
34
|
+
autoload :URGNA2012, 'rdf/normalize/urgna2012'
|
35
|
+
autoload :URDNA2015, 'rdf/normalize/urdna2015'
|
36
|
+
autoload :VERSION, 'rdf/normalize/version'
|
37
|
+
autoload :Writer, 'rdf/normalize/writer'
|
38
|
+
|
39
|
+
# Enumerable to normalize
|
40
|
+
# @return [RDF::Enumerable]
|
41
|
+
attr_accessor :dataset
|
42
|
+
|
43
|
+
ALGORITHMS = {
|
44
|
+
carroll2001: :Carroll2001,
|
45
|
+
urgna2012: :URGNA2012,
|
46
|
+
urdna2015: :URDNA2015
|
47
|
+
}.freeze
|
48
|
+
|
49
|
+
##
|
50
|
+
# Creates a new normalizer instance using either the specified or default normalizer algorithm
|
51
|
+
# @param [RDF::Enumerable] enumerable
|
52
|
+
# @param [Hash{Symbol => Object}] options
|
53
|
+
# @option options [Base] :algorithm (:urdna2015)
|
54
|
+
# One of `:carroll2001`, `:urgna2012`, or `:urdna2015`
|
55
|
+
# @return [RDF::Normalize::Base]
|
56
|
+
# @raise [ArgumentError] selected algorithm not defined
|
57
|
+
def new(enumerable, options = {})
|
58
|
+
algorithm = options.fetch(:algorithm, :urdna2015)
|
59
|
+
raise ArgumentError, "No algoritm defined for #{algorithm.to_sym}" unless ALGORITHMS.has_key?(algorithm)
|
60
|
+
algorithm_class = const_get(ALGORITHMS[algorithm])
|
61
|
+
algorithm_class.new(enumerable, options)
|
62
|
+
end
|
63
|
+
module_function :new
|
64
|
+
|
65
|
+
end
|
66
|
+
end
|
@@ -0,0 +1,15 @@
|
|
1
|
+
module RDF::Normalize
|
2
|
+
##
|
3
|
+
# Abstract class for pluggable normalization algorithms. Delegates to a default or selected algorithm if instantiated
|
4
|
+
module Base
|
5
|
+
attr_reader :dataset
|
6
|
+
|
7
|
+
# Enumerates normalized statements
|
8
|
+
#
|
9
|
+
# @yield statement
|
10
|
+
# @yieldparam [RDF::Statement] statement
|
11
|
+
def each(&block)
|
12
|
+
raise "Not Implemented"
|
13
|
+
end
|
14
|
+
end
|
15
|
+
end
|
@@ -0,0 +1,166 @@
|
|
1
|
+
module RDF::Normalize
|
2
|
+
class Carroll2001
|
3
|
+
include RDF::Enumerable
|
4
|
+
include Base
|
5
|
+
include Utils
|
6
|
+
|
7
|
+
##
|
8
|
+
# Create an enumerable with grounded nodes
|
9
|
+
#
|
10
|
+
# @param [RDF::Enumerable] enumerable
|
11
|
+
# @return [RDF::Enumerable]
|
12
|
+
def initialize(enumerable, options)
|
13
|
+
@dataset = enumerable
|
14
|
+
end
|
15
|
+
|
16
|
+
def each(&block)
|
17
|
+
ground_statements, anon_statements = [], []
|
18
|
+
dataset.each_statement do |statement|
|
19
|
+
(statement.has_blank_nodes? ? anon_statements : ground_statements) << statement
|
20
|
+
end
|
21
|
+
|
22
|
+
nodes = anon_statements.map(&:to_quad).flatten.compact.select(&:node?).uniq
|
23
|
+
|
24
|
+
# Create a hash signature of every node, based on the signature of
|
25
|
+
# statements it exists in.
|
26
|
+
# We also save hashes of nodes that cannot be reliably known; we will use
|
27
|
+
# that information to eliminate possible recursion combinations.
|
28
|
+
#
|
29
|
+
# Any mappings given in the method parameters are considered grounded.
|
30
|
+
hashes, ungrounded_hashes = hash_nodes(anon_statements, nodes, {})
|
31
|
+
|
32
|
+
# FIXME: likely need to iterate until hashes and ungrounded_hashes are the same size
|
33
|
+
while hashes.size != ungrounded_hashes.size
|
34
|
+
raise "Not done"
|
35
|
+
end
|
36
|
+
|
37
|
+
# Enumerate all statements, replacing nodes with new ground nodes using the hash as an identifier
|
38
|
+
ground_statements.each(&block)
|
39
|
+
anon_statements.each do |statement|
|
40
|
+
quad = statement.to_quad.compact.map do |term|
|
41
|
+
term.node? ? RDF::Node.intern(hashes[term]) : term
|
42
|
+
end
|
43
|
+
block.call RDF::Statement.from(quad)
|
44
|
+
end
|
45
|
+
end
|
46
|
+
|
47
|
+
private
|
48
|
+
|
49
|
+
# Given a set of statements, create a mapping of node => SHA1 for a given
|
50
|
+
# set of blank nodes.
|
51
|
+
#
|
52
|
+
# Returns a tuple of hashes: one of grounded hashes, and one of all
|
53
|
+
# hashes. grounded hashes are based on non-blank nodes and grounded blank
|
54
|
+
# nodes, and can be used to determine if a node's signature matches
|
55
|
+
# another.
|
56
|
+
#
|
57
|
+
# @param [Array] statements
|
58
|
+
# @param [Array] nodes
|
59
|
+
# @param [Hash] grounded_hashes
|
60
|
+
# mapping of node => SHA1 pairs as input, used to create more specific signatures of other nodes.
|
61
|
+
# @private
|
62
|
+
# @return [Hash, Hash]
|
63
|
+
def hash_nodes(statements, nodes, grounded_hashes)
|
64
|
+
hashes = grounded_hashes.dup
|
65
|
+
ungrounded_hashes = {}
|
66
|
+
hash_needed = true
|
67
|
+
|
68
|
+
# We may have to go over the list multiple times. If a node is marked as
|
69
|
+
# grounded, other nodes can then use it to decide their own state of
|
70
|
+
# grounded.
|
71
|
+
while hash_needed
|
72
|
+
starting_grounded_nodes = hashes.size
|
73
|
+
nodes.each do | node |
|
74
|
+
unless hashes.member? node
|
75
|
+
grounded, hash = node_hash_for(node, statements, hashes)
|
76
|
+
if grounded
|
77
|
+
hashes[node] = hash
|
78
|
+
end
|
79
|
+
ungrounded_hashes[node] = hash
|
80
|
+
end
|
81
|
+
end
|
82
|
+
|
83
|
+
# after going over the list, any nodes with a unique hash can be marked
|
84
|
+
# as grounded, even if we have not tied them back to a root yet.
|
85
|
+
uniques = {}
|
86
|
+
ungrounded_hashes.each do |node, hash|
|
87
|
+
uniques[hash] = uniques.has_key?(hash) ? false : node
|
88
|
+
end
|
89
|
+
uniques.each do |hash, node|
|
90
|
+
hashes[node] = hash if node
|
91
|
+
end
|
92
|
+
hash_needed = starting_grounded_nodes != hashes.size
|
93
|
+
end
|
94
|
+
[hashes, ungrounded_hashes]
|
95
|
+
end
|
96
|
+
|
97
|
+
# Generate a hash for a node based on the signature of the statements it
|
98
|
+
# appears in. Signatures consist of grounded elements in statements
|
99
|
+
# associated with a node, that is, anything but an ungrounded anonymous
|
100
|
+
# node. Creating the hash is simply hashing a sorted list of each
|
101
|
+
# statement's signature, which is itself a concatenation of the string form
|
102
|
+
# of all grounded elements.
|
103
|
+
#
|
104
|
+
# Nodes other than the given node are considered grounded if they are a
|
105
|
+
# member in the given hash.
|
106
|
+
#
|
107
|
+
# @param [RDF::Node] node
|
108
|
+
# @param [Array<RDF::Statement>] statements
|
109
|
+
# @param [Hash] hashes
|
110
|
+
# @return [Boolean, String]
|
111
|
+
# a tuple consisting of grounded being true or false and the String for the hash
|
112
|
+
def node_hash_for(node, statements, hashes)
|
113
|
+
statement_signatures = []
|
114
|
+
grounded = true
|
115
|
+
statements.each do | statement |
|
116
|
+
if statement.to_quad.include?(node)
|
117
|
+
statement_signatures << hash_string_for(statement, hashes, node)
|
118
|
+
statement.to_quad.compact.each do | resource |
|
119
|
+
grounded = false unless grounded?(resource, hashes) || resource == node
|
120
|
+
end
|
121
|
+
end
|
122
|
+
end
|
123
|
+
# Note that we sort the signatures--without a canonical ordering,
|
124
|
+
# we might get different hashes for equivalent nodes.
|
125
|
+
[grounded,Digest::SHA1.hexdigest(statement_signatures.sort.to_s)]
|
126
|
+
end
|
127
|
+
|
128
|
+
# Provide a string signature for the given statement, collecting
|
129
|
+
# string signatures for grounded node elements.
|
130
|
+
# @return [String]
|
131
|
+
def hash_string_for(statement, hashes, node)
|
132
|
+
statement.to_quad.map {|r| string_for_node(r, hashes, node)}.join("")
|
133
|
+
end
|
134
|
+
|
135
|
+
# Returns true if a given node is grounded
|
136
|
+
# A node is groundd if it is not a blank node or it is included
|
137
|
+
# in the given mapping of grounded nodes.
|
138
|
+
# @return [Boolean]
|
139
|
+
def grounded?(node, hashes)
|
140
|
+
(!(node.node?)) || (hashes.member? node)
|
141
|
+
end
|
142
|
+
|
143
|
+
# Provides a string for the given node for use in a string signature
|
144
|
+
# Non-anonymous nodes will return their string form. Grounded anonymous
|
145
|
+
# nodes will return their hashed form.
|
146
|
+
# @return [String]
|
147
|
+
def string_for_node(node, hashes, target)
|
148
|
+
case
|
149
|
+
when node.nil?
|
150
|
+
""
|
151
|
+
when node == target
|
152
|
+
"itself"
|
153
|
+
when node.node? && hashes.member?(node)
|
154
|
+
hashes[node]
|
155
|
+
when node.node?
|
156
|
+
"a blank node"
|
157
|
+
# RDF.rb auto-boxing magic makes some literals the same when they
|
158
|
+
# should not be; the ntriples serializer will take care of us
|
159
|
+
when node.literal?
|
160
|
+
node.class.name + RDF::NTriples.serialize(node)
|
161
|
+
else
|
162
|
+
node.to_s
|
163
|
+
end
|
164
|
+
end
|
165
|
+
end
|
166
|
+
end
|
@@ -0,0 +1,264 @@
|
|
1
|
+
module RDF::Normalize
|
2
|
+
class URDNA2015
|
3
|
+
include RDF::Enumerable
|
4
|
+
include Base
|
5
|
+
include Utils
|
6
|
+
|
7
|
+
##
|
8
|
+
# Create an enumerable with grounded nodes
|
9
|
+
#
|
10
|
+
# @param [RDF::Enumerable] enumerable
|
11
|
+
# @return [RDF::Enumerable]
|
12
|
+
def initialize(enumerable, options)
|
13
|
+
@dataset, @options = enumerable, options
|
14
|
+
end
|
15
|
+
|
16
|
+
def each(&block)
|
17
|
+
ns = NormalizationState.new(@options)
|
18
|
+
normalize_statements(ns, &block)
|
19
|
+
end
|
20
|
+
|
21
|
+
protected
|
22
|
+
def normalize_statements(ns, &block)
|
23
|
+
# Map BNodes to the statements they are used by
|
24
|
+
dataset.each_statement do |statement|
|
25
|
+
statement.to_quad.compact.select(&:node?).each do |node|
|
26
|
+
ns.add_statement(node, statement)
|
27
|
+
end
|
28
|
+
end
|
29
|
+
|
30
|
+
non_normalized_identifiers, simple = ns.bnode_to_statements.keys, true
|
31
|
+
|
32
|
+
while simple
|
33
|
+
simple = false
|
34
|
+
ns.hash_to_bnodes = {}
|
35
|
+
|
36
|
+
# Calculate hashes for first degree nodes
|
37
|
+
non_normalized_identifiers.each do |node|
|
38
|
+
hash = depth {ns.hash_first_degree_quads(node)}
|
39
|
+
debug("1deg") {"hash: #{hash}"}
|
40
|
+
ns.add_bnode_hash(node, hash)
|
41
|
+
end
|
42
|
+
|
43
|
+
# Create canonical replacements for hashes mapping to a single node
|
44
|
+
ns.hash_to_bnodes.keys.sort.each do |hash|
|
45
|
+
identifier_list = ns.hash_to_bnodes[hash]
|
46
|
+
next if identifier_list.length > 1
|
47
|
+
node = identifier_list.first
|
48
|
+
id = ns.canonical_issuer.issue_identifier(node)
|
49
|
+
debug("single node") {"node: #{node.to_ntriples}, hash: #{hash}, id: #{id}"}
|
50
|
+
non_normalized_identifiers -= identifier_list
|
51
|
+
ns.hash_to_bnodes.delete(hash)
|
52
|
+
simple = true
|
53
|
+
end
|
54
|
+
end
|
55
|
+
|
56
|
+
# Iterate over hashs having more than one node
|
57
|
+
ns.hash_to_bnodes.keys.sort.each do |hash|
|
58
|
+
identifier_list = ns.hash_to_bnodes[hash]
|
59
|
+
|
60
|
+
debug("multiple nodes") {"node: #{identifier_list.map(&:to_ntriples).join(",")}, hash: #{hash}"}
|
61
|
+
hash_path_list = []
|
62
|
+
|
63
|
+
# Create a hash_path_list for all bnodes using a temporary identifier used to create canonical replacements
|
64
|
+
identifier_list.each do |identifier|
|
65
|
+
next if ns.canonical_issuer.issued.include?(identifier)
|
66
|
+
temporary_issuer = IdentifierIssuer.new("_:b")
|
67
|
+
temporary_issuer.issue_identifier(identifier)
|
68
|
+
hash_path_list << depth {ns.hash_n_degree_quads(identifier, temporary_issuer)}
|
69
|
+
end
|
70
|
+
debug("->") {"hash_path_list: #{hash_path_list.map(&:first).inspect}"}
|
71
|
+
|
72
|
+
# Create canonical replacements for nodes
|
73
|
+
hash_path_list.sort_by(&:first).map(&:last).each do |issuer|
|
74
|
+
issuer.issued.each do |node|
|
75
|
+
id = ns.canonical_issuer.issue_identifier(node)
|
76
|
+
debug("-->") {"node: #{node.to_ntriples}, id: #{id}"}
|
77
|
+
end
|
78
|
+
end
|
79
|
+
end
|
80
|
+
|
81
|
+
# Yield statements using BNodes from canonical replacements
|
82
|
+
dataset.each_statement do |statement|
|
83
|
+
if statement.has_blank_nodes?
|
84
|
+
quad = statement.to_quad.compact.map do |term|
|
85
|
+
term.node? ? RDF::Node.intern(ns.canonical_issuer.identifier(term)[2..-1]) : term
|
86
|
+
end
|
87
|
+
block.call RDF::Statement.from(quad)
|
88
|
+
else
|
89
|
+
block.call statement
|
90
|
+
end
|
91
|
+
end
|
92
|
+
end
|
93
|
+
|
94
|
+
private
|
95
|
+
|
96
|
+
class NormalizationState
|
97
|
+
include Utils
|
98
|
+
|
99
|
+
attr_accessor :bnode_to_statements
|
100
|
+
attr_accessor :hash_to_bnodes
|
101
|
+
attr_accessor :canonical_issuer
|
102
|
+
|
103
|
+
def initialize(options)
|
104
|
+
@options = options
|
105
|
+
@bnode_to_statements, @hash_to_bnodes, @canonical_issuer = {}, {}, IdentifierIssuer.new("_:c14n")
|
106
|
+
end
|
107
|
+
|
108
|
+
def add_statement(node, statement)
|
109
|
+
bnode_to_statements[node] ||= []
|
110
|
+
bnode_to_statements[node] << statement unless bnode_to_statements[node].include?(statement)
|
111
|
+
end
|
112
|
+
|
113
|
+
def add_bnode_hash(node, hash)
|
114
|
+
hash_to_bnodes[hash] ||= []
|
115
|
+
hash_to_bnodes[hash] << node unless hash_to_bnodes[hash].include?(node)
|
116
|
+
end
|
117
|
+
|
118
|
+
# @param [RDF::Node] node
|
119
|
+
# @return [String] the SHA1 hexdigest hash of statements using this node, with replacements
|
120
|
+
def hash_first_degree_quads(node)
|
121
|
+
quads = bnode_to_statements[node].
|
122
|
+
map do |statement|
|
123
|
+
quad = statement.to_quad.map do |t|
|
124
|
+
case t
|
125
|
+
when node then RDF::Node("a")
|
126
|
+
when RDF::Node then RDF::Node("z")
|
127
|
+
else t
|
128
|
+
end
|
129
|
+
end
|
130
|
+
RDF::NQuads::Writer.serialize(RDF::Statement.from(quad))
|
131
|
+
end
|
132
|
+
|
133
|
+
debug("1deg") {"node: #{node}, quads: #{quads}"}
|
134
|
+
hexdigest(quads.sort.join)
|
135
|
+
end
|
136
|
+
|
137
|
+
# @param [RDF::Node] related
|
138
|
+
# @param [RDF::Statement] statement
|
139
|
+
# @param [IdentifierIssuer] issuer
|
140
|
+
# @param [String] position one of :s, :o, or :g
|
141
|
+
# @return [String] the SHA1 hexdigest hash
|
142
|
+
def hash_related_node(related, statement, issuer, position)
|
143
|
+
identifier = canonical_issuer.identifier(related) ||
|
144
|
+
issuer.identifier(related) ||
|
145
|
+
hash_first_degree_quads(related)
|
146
|
+
input = position.to_s
|
147
|
+
input << statement.predicate.to_ntriples unless position == :g
|
148
|
+
input << identifier
|
149
|
+
debug("hrel") {"input: #{input.inspect}, hash: #{hexdigest(input)}"}
|
150
|
+
hexdigest(input)
|
151
|
+
end
|
152
|
+
|
153
|
+
# @param [RDF::Node] identifier
|
154
|
+
# @param [IdentifierIssuer] issuer
|
155
|
+
# @return [Array<String,IdentifierIssuer>] the Hash and issuer
|
156
|
+
def hash_n_degree_quads(identifier, issuer)
|
157
|
+
debug("ndeg") {"identifier: #{identifier.to_ntriples}"}
|
158
|
+
|
159
|
+
# hash to related blank nodes map
|
160
|
+
map = {}
|
161
|
+
|
162
|
+
bnode_to_statements[identifier].each do |statement|
|
163
|
+
hash_related_statement(identifier, statement, issuer, map)
|
164
|
+
end
|
165
|
+
|
166
|
+
data_to_hash = ""
|
167
|
+
|
168
|
+
debug("ndeg") {"map: #{map.map {|h,l| "#{h}: #{l.map(&:to_ntriples)}"}.join('; ')}"}
|
169
|
+
depth do
|
170
|
+
map.keys.sort.each do |hash|
|
171
|
+
list = map[hash]
|
172
|
+
# Iterate over related nodes
|
173
|
+
chosen_path, chosen_issuer = "", nil
|
174
|
+
data_to_hash += hash
|
175
|
+
|
176
|
+
list.permutation do |permutation|
|
177
|
+
debug("ndeg") {"perm: #{permutation.map(&:to_ntriples).join(",")}"}
|
178
|
+
issuer_copy, path, recursion_list = issuer.dup, "", []
|
179
|
+
|
180
|
+
permutation.each do |related|
|
181
|
+
if canonical_issuer.identifier(related)
|
182
|
+
path << canonical_issuer.issue_identifier(related)
|
183
|
+
else
|
184
|
+
recursion_list << related if !issuer_copy.identifier(related)
|
185
|
+
path << issuer_copy.issue_identifier(related)
|
186
|
+
end
|
187
|
+
|
188
|
+
# Skip to the next permutation if chosen path isn't empty and the path is greater than the chosen path
|
189
|
+
break if !chosen_path.empty? && path.length >= chosen_path.length
|
190
|
+
end
|
191
|
+
debug("ndeg") {"hash: #{hash}, path: #{path}, recursion: #{recursion_list.map(&:to_ntriples)}"}
|
192
|
+
|
193
|
+
recursion_list.each do |related|
|
194
|
+
result = depth {hash_n_degree_quads(related, issuer_copy)}
|
195
|
+
path << issuer_copy.issue_identifier(related)
|
196
|
+
path << "<#{result.first}>"
|
197
|
+
issuer_copy = result.last
|
198
|
+
break if !chosen_path.empty? && path.length >= chosen_path.length && path > chosen_path
|
199
|
+
end
|
200
|
+
|
201
|
+
if chosen_path.empty? || path < chosen_path
|
202
|
+
chosen_path, chosen_issuer = path, issuer_copy
|
203
|
+
end
|
204
|
+
end
|
205
|
+
|
206
|
+
data_to_hash += chosen_path
|
207
|
+
issuer = chosen_issuer
|
208
|
+
end
|
209
|
+
end
|
210
|
+
|
211
|
+
debug("ndeg") {"datatohash: #{data_to_hash.inspect}, hash: #{hexdigest(data_to_hash)}"}
|
212
|
+
return [hexdigest(data_to_hash), issuer]
|
213
|
+
end
|
214
|
+
|
215
|
+
protected
|
216
|
+
|
217
|
+
# FIXME: should be SHA-256.
|
218
|
+
def hexdigest(val)
|
219
|
+
Digest::SHA1.hexdigest(val)
|
220
|
+
end
|
221
|
+
|
222
|
+
# Group adjacent bnodes by hash
|
223
|
+
def hash_related_statement(identifier, statement, issuer, map)
|
224
|
+
statement.to_hash(:s, :p, :o, :g).each do |pos, term|
|
225
|
+
next if !term.is_a?(RDF::Node) || term == identifier
|
226
|
+
|
227
|
+
hash = depth {hash_related_node(term, statement, issuer, pos)}
|
228
|
+
map[hash] ||= []
|
229
|
+
map[hash] << term unless map[hash].include?(term)
|
230
|
+
end
|
231
|
+
end
|
232
|
+
end
|
233
|
+
|
234
|
+
class IdentifierIssuer
|
235
|
+
def initialize(prefix = "_:c14n")
|
236
|
+
@prefix, @counter, @issued = prefix, 0, {}
|
237
|
+
end
|
238
|
+
|
239
|
+
# Return an identifier for this BNode
|
240
|
+
def issue_identifier(node)
|
241
|
+
@issued[node] ||= begin
|
242
|
+
res, @counter = @prefix + @counter.to_s, @counter + 1
|
243
|
+
res
|
244
|
+
end
|
245
|
+
end
|
246
|
+
|
247
|
+
def issued
|
248
|
+
@issued.keys
|
249
|
+
end
|
250
|
+
|
251
|
+
def identifier(node)
|
252
|
+
@issued[node]
|
253
|
+
end
|
254
|
+
|
255
|
+
# Duplicate this issuer, ensuring that the issued identifiers remain distinct
|
256
|
+
# @return [IdentifierIssuer]
|
257
|
+
def dup
|
258
|
+
other = super
|
259
|
+
other.instance_variable_set(:@issued, @issued.dup)
|
260
|
+
other
|
261
|
+
end
|
262
|
+
end
|
263
|
+
end
|
264
|
+
end
|
@@ -0,0 +1,47 @@
|
|
1
|
+
module RDF::Normalize
|
2
|
+
class URGNA2012 < URDNA2015
|
3
|
+
|
4
|
+
def each(&block)
|
5
|
+
ns = NormalizationState.new(@options)
|
6
|
+
normalize_statements(ns, &block)
|
7
|
+
end
|
8
|
+
|
9
|
+
class NormalizationState < URDNA2015::NormalizationState
|
10
|
+
protected
|
11
|
+
|
12
|
+
# 2012 version uses SHA-1
|
13
|
+
def hexdigest(val)
|
14
|
+
Digest::SHA1.hexdigest(val)
|
15
|
+
end
|
16
|
+
|
17
|
+
# @param [RDF::Node] related
|
18
|
+
# @param [RDF::Statement] statement
|
19
|
+
# @param [IdentifierIssuer] issuer
|
20
|
+
# @param [String] position one of :s, :o, or :g
|
21
|
+
# @return [String] the SHA1 hexdigest hash
|
22
|
+
def hash_related_node(related, statement, issuer, position)
|
23
|
+
identifier = canonical_issuer.identifier(related) ||
|
24
|
+
issuer.identifier(related) ||
|
25
|
+
hash_first_degree_quads(related)
|
26
|
+
input = position.to_s
|
27
|
+
input << statement.predicate.to_s
|
28
|
+
input << identifier
|
29
|
+
debug("hrel") {"input: #{input.inspect}, hash: #{hexdigest(input)}"}
|
30
|
+
hexdigest(input)
|
31
|
+
end
|
32
|
+
|
33
|
+
# In URGNA2012, the position parameter passed to the Hash Related Blank Node algorithm was instead modeled as a direction parameter, where it could have the value p, for property, when the related blank node was a `subject` and the value r, for reverse or reference, when the related blank node was an `object`. Since URGNA2012 only normalized graphs, not datasets, there was no use of the `graph` position.
|
34
|
+
def hash_related_statement(identifier, statement, issuer, map)
|
35
|
+
if statement.subject.node? && statement.subject != identifier
|
36
|
+
hash = depth {hash_related_node(statement.subject, statement, issuer, :p)}
|
37
|
+
map[hash] ||= []
|
38
|
+
map[hash] << statement.subject unless map[hash].include?(statement.subject)
|
39
|
+
elsif statement.object.node? && statement.object != identifier
|
40
|
+
hash = depth {hash_related_node(statement.object, statement, issuer, :r)}
|
41
|
+
map[hash] ||= []
|
42
|
+
map[hash] << statement.object unless map[hash].include?(statement.object)
|
43
|
+
end
|
44
|
+
end
|
45
|
+
end
|
46
|
+
end
|
47
|
+
end
|
@@ -0,0 +1,33 @@
|
|
1
|
+
module RDF::Normalize
|
2
|
+
module Utils
|
3
|
+
# Add debug event to debug array, if specified
|
4
|
+
#
|
5
|
+
# param [String] message
|
6
|
+
# yieldreturn [String] appended to message, to allow for lazy-evaulation of message
|
7
|
+
def debug(*args)
|
8
|
+
options = args.last.is_a?(Hash) ? args.pop : {}
|
9
|
+
return unless options[:debug] || @options[:debug]
|
10
|
+
depth = options[:depth] || @options[:depth]
|
11
|
+
d_str = depth > 100 ? ' ' * 100 + '+' : ' ' * depth
|
12
|
+
list = args
|
13
|
+
list << yield if block_given?
|
14
|
+
message = d_str + (list.empty? ? "" : list.join(": "))
|
15
|
+
options[:debug] << message if options[:debug].is_a?(Array)
|
16
|
+
@options[:debug] << message if @options[:debug].is_a?(Array)
|
17
|
+
$stderr.puts(message) if @options[:debug] == TrueClass
|
18
|
+
end
|
19
|
+
module_function :debug
|
20
|
+
|
21
|
+
# Increase depth around a method invocation
|
22
|
+
# @yield
|
23
|
+
# Yields with no arguments
|
24
|
+
# @yieldreturn [Object] returns the result of yielding
|
25
|
+
# @return [Object]
|
26
|
+
def depth
|
27
|
+
@options[:depth] += 1
|
28
|
+
ret = yield
|
29
|
+
@options[:depth] -= 1
|
30
|
+
ret
|
31
|
+
end
|
32
|
+
end
|
33
|
+
end
|
@@ -0,0 +1,79 @@
|
|
1
|
+
module RDF::Normalize
|
2
|
+
##
|
3
|
+
# A RDF Graph normalization serialiser.
|
4
|
+
#
|
5
|
+
# Normalizes the enumerated statements into normal form in the form of N-Quads.
|
6
|
+
#
|
7
|
+
# @author [Gregg Kellogg](http://kellogg-assoc.com/)
|
8
|
+
class Writer < RDF::NQuads::Writer
|
9
|
+
format RDF::Normalize::Format
|
10
|
+
|
11
|
+
# @attr_accessor [RDF::Repository] Repository of statements to serialized
|
12
|
+
attr_accessor :repo
|
13
|
+
|
14
|
+
##
|
15
|
+
# Initializes the writer instance.
|
16
|
+
#
|
17
|
+
# @param [IO, File] output
|
18
|
+
# the output stream
|
19
|
+
# @param [Hash{Symbol => Object}] options
|
20
|
+
# any additional options
|
21
|
+
# @yield [writer] `self`
|
22
|
+
# @yieldparam [RDF::Writer] writer
|
23
|
+
# @yieldreturn [void]
|
24
|
+
# @yield [writer]
|
25
|
+
# @yieldparam [RDF::Writer] writer
|
26
|
+
def initialize(output = $stdout, options = {}, &block)
|
27
|
+
super do
|
28
|
+
@options[:depth] ||= 0
|
29
|
+
@repo = RDF::Repository.new
|
30
|
+
if block_given?
|
31
|
+
case block.arity
|
32
|
+
when 0 then instance_eval(&block)
|
33
|
+
else block.call(self)
|
34
|
+
end
|
35
|
+
end
|
36
|
+
end
|
37
|
+
end
|
38
|
+
|
39
|
+
##
|
40
|
+
# Defer writing to epilogue
|
41
|
+
def write_statement(statement)
|
42
|
+
self
|
43
|
+
end
|
44
|
+
|
45
|
+
##
|
46
|
+
# Outputs the Graph representation of all stored triples.
|
47
|
+
#
|
48
|
+
# @return [void]
|
49
|
+
def write_epilogue
|
50
|
+
statements = RDF::Normalize.new(@repo, @options).
|
51
|
+
statements.
|
52
|
+
reject(&:variable?).
|
53
|
+
map {|s| format_statement(s)}.
|
54
|
+
sort.
|
55
|
+
each do |line|
|
56
|
+
puts line
|
57
|
+
end
|
58
|
+
end
|
59
|
+
|
60
|
+
protected
|
61
|
+
|
62
|
+
##
|
63
|
+
# Adds a statement to be serialized
|
64
|
+
# @param [RDF::Statement] statement
|
65
|
+
# @return [void]
|
66
|
+
def insert_statement(statement)
|
67
|
+
@repo.insert(statement)
|
68
|
+
end
|
69
|
+
|
70
|
+
##
|
71
|
+
# Insert an Enumerable
|
72
|
+
#
|
73
|
+
# @param [RDF::Enumerable] graph
|
74
|
+
# @return [void]
|
75
|
+
def insert_statements(enumerable)
|
76
|
+
@repo = enumerable
|
77
|
+
end
|
78
|
+
end
|
79
|
+
end
|
metadata
ADDED
@@ -0,0 +1,160 @@
|
|
1
|
+
--- !ruby/object:Gem::Specification
|
2
|
+
name: rdf-normalize
|
3
|
+
version: !ruby/object:Gem::Version
|
4
|
+
version: 0.1.0
|
5
|
+
platform: ruby
|
6
|
+
authors:
|
7
|
+
- Gregg Kellogg
|
8
|
+
autorequire:
|
9
|
+
bindir: bin
|
10
|
+
cert_chain: []
|
11
|
+
date: 2015-05-20 00:00:00.000000000 Z
|
12
|
+
dependencies:
|
13
|
+
- !ruby/object:Gem::Dependency
|
14
|
+
name: rdf
|
15
|
+
requirement: !ruby/object:Gem::Requirement
|
16
|
+
requirements:
|
17
|
+
- - "~>"
|
18
|
+
- !ruby/object:Gem::Version
|
19
|
+
version: '1.1'
|
20
|
+
type: :runtime
|
21
|
+
prerelease: false
|
22
|
+
version_requirements: !ruby/object:Gem::Requirement
|
23
|
+
requirements:
|
24
|
+
- - "~>"
|
25
|
+
- !ruby/object:Gem::Version
|
26
|
+
version: '1.1'
|
27
|
+
- !ruby/object:Gem::Dependency
|
28
|
+
name: rdf-spec
|
29
|
+
requirement: !ruby/object:Gem::Requirement
|
30
|
+
requirements:
|
31
|
+
- - "~>"
|
32
|
+
- !ruby/object:Gem::Version
|
33
|
+
version: '1.1'
|
34
|
+
type: :development
|
35
|
+
prerelease: false
|
36
|
+
version_requirements: !ruby/object:Gem::Requirement
|
37
|
+
requirements:
|
38
|
+
- - "~>"
|
39
|
+
- !ruby/object:Gem::Version
|
40
|
+
version: '1.1'
|
41
|
+
- !ruby/object:Gem::Dependency
|
42
|
+
name: open-uri-cached
|
43
|
+
requirement: !ruby/object:Gem::Requirement
|
44
|
+
requirements:
|
45
|
+
- - "~>"
|
46
|
+
- !ruby/object:Gem::Version
|
47
|
+
version: '0.0'
|
48
|
+
- - ">="
|
49
|
+
- !ruby/object:Gem::Version
|
50
|
+
version: 0.0.5
|
51
|
+
type: :development
|
52
|
+
prerelease: false
|
53
|
+
version_requirements: !ruby/object:Gem::Requirement
|
54
|
+
requirements:
|
55
|
+
- - "~>"
|
56
|
+
- !ruby/object:Gem::Version
|
57
|
+
version: '0.0'
|
58
|
+
- - ">="
|
59
|
+
- !ruby/object:Gem::Version
|
60
|
+
version: 0.0.5
|
61
|
+
- !ruby/object:Gem::Dependency
|
62
|
+
name: rspec
|
63
|
+
requirement: !ruby/object:Gem::Requirement
|
64
|
+
requirements:
|
65
|
+
- - "~>"
|
66
|
+
- !ruby/object:Gem::Version
|
67
|
+
version: '3.2'
|
68
|
+
type: :development
|
69
|
+
prerelease: false
|
70
|
+
version_requirements: !ruby/object:Gem::Requirement
|
71
|
+
requirements:
|
72
|
+
- - "~>"
|
73
|
+
- !ruby/object:Gem::Version
|
74
|
+
version: '3.2'
|
75
|
+
- !ruby/object:Gem::Dependency
|
76
|
+
name: webmock
|
77
|
+
requirement: !ruby/object:Gem::Requirement
|
78
|
+
requirements:
|
79
|
+
- - "~>"
|
80
|
+
- !ruby/object:Gem::Version
|
81
|
+
version: '1.17'
|
82
|
+
type: :development
|
83
|
+
prerelease: false
|
84
|
+
version_requirements: !ruby/object:Gem::Requirement
|
85
|
+
requirements:
|
86
|
+
- - "~>"
|
87
|
+
- !ruby/object:Gem::Version
|
88
|
+
version: '1.17'
|
89
|
+
- !ruby/object:Gem::Dependency
|
90
|
+
name: json-ld
|
91
|
+
requirement: !ruby/object:Gem::Requirement
|
92
|
+
requirements:
|
93
|
+
- - "~>"
|
94
|
+
- !ruby/object:Gem::Version
|
95
|
+
version: '1.1'
|
96
|
+
type: :development
|
97
|
+
prerelease: false
|
98
|
+
version_requirements: !ruby/object:Gem::Requirement
|
99
|
+
requirements:
|
100
|
+
- - "~>"
|
101
|
+
- !ruby/object:Gem::Version
|
102
|
+
version: '1.1'
|
103
|
+
- !ruby/object:Gem::Dependency
|
104
|
+
name: yard
|
105
|
+
requirement: !ruby/object:Gem::Requirement
|
106
|
+
requirements:
|
107
|
+
- - "~>"
|
108
|
+
- !ruby/object:Gem::Version
|
109
|
+
version: '0.8'
|
110
|
+
type: :development
|
111
|
+
prerelease: false
|
112
|
+
version_requirements: !ruby/object:Gem::Requirement
|
113
|
+
requirements:
|
114
|
+
- - "~>"
|
115
|
+
- !ruby/object:Gem::Version
|
116
|
+
version: '0.8'
|
117
|
+
description: RDF::Normalize is a Graph normalizer for the RDF.rb library suite.
|
118
|
+
email: public-rdf-ruby@w3.org
|
119
|
+
executables: []
|
120
|
+
extensions: []
|
121
|
+
extra_rdoc_files: []
|
122
|
+
files:
|
123
|
+
- AUTHORS
|
124
|
+
- LICENSE
|
125
|
+
- README.md
|
126
|
+
- VERSION
|
127
|
+
- lib/rdf/normalize.rb
|
128
|
+
- lib/rdf/normalize/base.rb
|
129
|
+
- lib/rdf/normalize/carroll2001.rb
|
130
|
+
- lib/rdf/normalize/format.rb
|
131
|
+
- lib/rdf/normalize/urdna2015.rb
|
132
|
+
- lib/rdf/normalize/urgna2012.rb
|
133
|
+
- lib/rdf/normalize/utils.rb
|
134
|
+
- lib/rdf/normalize/writer.rb
|
135
|
+
homepage: http://github.com/gkellogg/rdf-normalize
|
136
|
+
licenses:
|
137
|
+
- Public Domain
|
138
|
+
metadata: {}
|
139
|
+
post_install_message:
|
140
|
+
rdoc_options: []
|
141
|
+
require_paths:
|
142
|
+
- lib
|
143
|
+
required_ruby_version: !ruby/object:Gem::Requirement
|
144
|
+
requirements:
|
145
|
+
- - ">="
|
146
|
+
- !ruby/object:Gem::Version
|
147
|
+
version: 1.9.2
|
148
|
+
required_rubygems_version: !ruby/object:Gem::Requirement
|
149
|
+
requirements:
|
150
|
+
- - ">="
|
151
|
+
- !ruby/object:Gem::Version
|
152
|
+
version: '0'
|
153
|
+
requirements: []
|
154
|
+
rubyforge_project: rdf-normalize
|
155
|
+
rubygems_version: 2.4.7
|
156
|
+
signing_key:
|
157
|
+
specification_version: 4
|
158
|
+
summary: RDF Graph normalizer for Ruby.
|
159
|
+
test_files: []
|
160
|
+
has_rdoc: false
|