rdf-normalize 0.1.0
Sign up to get free protection for your applications and to get access to all the features.
- checksums.yaml +7 -0
- data/AUTHORS +1 -0
- data/LICENSE +25 -0
- data/README.md +82 -0
- data/VERSION +1 -0
- data/lib/rdf/normalize.rb +66 -0
- data/lib/rdf/normalize/base.rb +15 -0
- data/lib/rdf/normalize/carroll2001.rb +166 -0
- data/lib/rdf/normalize/format.rb +11 -0
- data/lib/rdf/normalize/urdna2015.rb +264 -0
- data/lib/rdf/normalize/urgna2012.rb +47 -0
- data/lib/rdf/normalize/utils.rb +33 -0
- data/lib/rdf/normalize/writer.rb +79 -0
- metadata +160 -0
checksums.yaml
ADDED
@@ -0,0 +1,7 @@
|
|
1
|
+
---
|
2
|
+
SHA1:
|
3
|
+
metadata.gz: d7076dcfeccdbfc0b35ec046d0b338a6ad41d776
|
4
|
+
data.tar.gz: cd5f278797b575a3a6cced04890b9014c2350f42
|
5
|
+
SHA512:
|
6
|
+
metadata.gz: 2510cec72f19af6eef55678382f688a5948b59fad4c6be18465c53a66d16b25bb543dadbd5a5148675d291afac133c8cc7d5399650ad9043b858c1a1f6165291
|
7
|
+
data.tar.gz: f785bd00b4abacf7da181daae96e2101aa449b19791a08742654451cf9a3b25abdf368dd77d99a86c4f74a49084b2d9af6464ea30bbed45589f87713e899ee63
|
data/AUTHORS
ADDED
@@ -0,0 +1 @@
|
|
1
|
+
* Gregg Kellogg <gregg@greggkellogg.net>
|
data/LICENSE
ADDED
@@ -0,0 +1,25 @@
|
|
1
|
+
This is free and unencumbered software released into the public domain.
|
2
|
+
|
3
|
+
Anyone is free to copy, modify, publish, use, compile, sell, or
|
4
|
+
distribute this software, either in source code form or as a compiled
|
5
|
+
binary, for any purpose, commercial or non-commercial, and by any
|
6
|
+
means.
|
7
|
+
|
8
|
+
In jurisdictions that recognize copyright laws, the author or authors
|
9
|
+
of this software dedicate any and all copyright interest in the
|
10
|
+
software to the public domain. We make this dedication for the benefit
|
11
|
+
of the public at large and to the detriment of our heirs and
|
12
|
+
successors. We intend this dedication to be an overt act of
|
13
|
+
relinquishment in perpetuity of all present and future rights to this
|
14
|
+
software under copyright law.
|
15
|
+
|
16
|
+
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
|
17
|
+
EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
|
18
|
+
MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.
|
19
|
+
IN NO EVENT SHALL THE AUTHORS BE LIABLE FOR ANY CLAIM, DAMAGES OR
|
20
|
+
OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
|
21
|
+
ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
|
22
|
+
OTHER DEALINGS IN THE SOFTWARE.
|
23
|
+
|
24
|
+
For more information, please refer to <http://unlicense.org>
|
25
|
+
|
data/README.md
ADDED
@@ -0,0 +1,82 @@
|
|
1
|
+
# RDF::Normalize
|
2
|
+
RDF Graph normalizer for [RDF.rb][RDF.rb].
|
3
|
+
|
4
|
+
[![Gem Version](https://badge.fury.io/rb/rdf-normalize.png)](http://badge.fury.io/rb/rdf-normalize)
|
5
|
+
[![Build Status](https://secure.travis-ci.org/ruby-rdf/rdf-normalize.png?branch=master)](http://travis-ci.org/ruby-rdf/rdf-normalize)
|
6
|
+
|
7
|
+
## Description
|
8
|
+
This is a [Ruby][] implementation of a [RDF Normalize][] for [RDF.rb][].
|
9
|
+
|
10
|
+
## Features
|
11
|
+
RDF::Normalize generates normalized [N-Quads][] output for an RDF Dataset using the algorithm
|
12
|
+
defined in [RDF Normalize][]. It also implements an RDF Writer interface, which can be used
|
13
|
+
to serialize normalized statements.
|
14
|
+
|
15
|
+
Algorithms implemented:
|
16
|
+
|
17
|
+
* [URGNA2012](http://json-ld.github.io/normalization/spec/index.html#dfn-urgna2012)
|
18
|
+
* [URDNA2014](http://json-ld.github.io/normalization/spec/index.html#dfn-urdna2015)
|
19
|
+
|
20
|
+
Install with `gem install rdf-normalize`
|
21
|
+
|
22
|
+
* 100% free and unencumbered [public domain](http://unlicense.org/) software.
|
23
|
+
* Compatible with Ruby >= 1.9.3.
|
24
|
+
|
25
|
+
## Usage
|
26
|
+
|
27
|
+
## Documentation
|
28
|
+
Full documentation available on [Rubydoc.info][Normalize doc]
|
29
|
+
|
30
|
+
### Principle Classes
|
31
|
+
* {RDF::Normalize}
|
32
|
+
* {RDF::Normalize::Base}
|
33
|
+
* {RDF::Normalize::Format}
|
34
|
+
* {RDF::Normalize::Writer}
|
35
|
+
* {RDF::Normalize::URGNA2012}
|
36
|
+
* {RDF::Normalize::URDNA2015}
|
37
|
+
|
38
|
+
|
39
|
+
## Dependencies
|
40
|
+
|
41
|
+
* [Ruby](http://ruby-lang.org/) (>= 1.9.2)
|
42
|
+
* [RDF.rb](http://rubygems.org/gems/rdf) (~> 1.1)
|
43
|
+
|
44
|
+
## Installation
|
45
|
+
|
46
|
+
The recommended installation method is via [RubyGems](http://rubygems.org/).
|
47
|
+
To install the latest official release of the `RDF::Normalize` gem, do:
|
48
|
+
|
49
|
+
% [sudo] gem install rdf-normalize
|
50
|
+
|
51
|
+
## Mailing List
|
52
|
+
* <http://lists.w3.org/Archives/Public/public-rdf-ruby/>
|
53
|
+
|
54
|
+
## Author
|
55
|
+
* [Gregg Kellogg](http://github.com/gkellogg) - <http://greggkellogg.net/>
|
56
|
+
|
57
|
+
## Contributing
|
58
|
+
* Do your best to adhere to the existing coding conventions and idioms.
|
59
|
+
* Don't use hard tabs, and don't leave trailing whitespace on any line.
|
60
|
+
* Do document every method you add using [YARD][] annotations. Read the
|
61
|
+
[tutorial][YARD-GS] or just look at the existing code for examples.
|
62
|
+
* Don't touch the `.gemspec`, `VERSION` or `AUTHORS` files. If you need to
|
63
|
+
change them, do so on your private branch only.
|
64
|
+
* Do feel free to add yourself to the `CREDITS` file and the corresponding
|
65
|
+
list in the the `README`. Alphabetical order applies.
|
66
|
+
* Do note that in order for us to merge any non-trivial changes (as a rule
|
67
|
+
of thumb, additions larger than about 15 lines of code), we need an
|
68
|
+
explicit [public domain dedication][PDD] on record from you.
|
69
|
+
|
70
|
+
## License
|
71
|
+
This is free and unencumbered public domain software. For more information,
|
72
|
+
see <http://unlicense.org/> or the accompanying {file:LICENSE} file.
|
73
|
+
|
74
|
+
[Ruby]: http://ruby-lang.org/
|
75
|
+
[RDF]: http://www.w3.org/RDF/
|
76
|
+
[YARD]: http://yardoc.org/
|
77
|
+
[YARD-GS]: http://rubydoc.info/docs/yard/file/docs/GettingStarted.md
|
78
|
+
[PDD]: http://lists.w3.org/Archives/Public/public-rdf-ruby/2010May/0013.html
|
79
|
+
[RDF.rb]: http://rubydoc.info/github/ruby-rdf/rdf-normalize
|
80
|
+
[N-Triples]: http://www.w3.org/TR/rdf-testcases/#ntriples
|
81
|
+
[RDF Normalize]:http://json-ld.github.io/normalization/spec/
|
82
|
+
[Normalize doc]:http://rubydoc.info/github/ruby-rdf/rdf-normalize/master/file/README.markdown
|
data/VERSION
ADDED
@@ -0,0 +1 @@
|
|
1
|
+
0.1.0
|
@@ -0,0 +1,66 @@
|
|
1
|
+
require 'rdf'
|
2
|
+
|
3
|
+
module RDF
|
4
|
+
##
|
5
|
+
# **`RDF::Normalize`** is an RDF Graph normalization plugin for RDF.rb.
|
6
|
+
#
|
7
|
+
# @example Requiring the `RDF::Normalize` module
|
8
|
+
# require 'rdf/normalize'
|
9
|
+
#
|
10
|
+
# @example Returning an iterator for normalized statements
|
11
|
+
#
|
12
|
+
# g = RDF::Graph.load("etc/doap.ttl")
|
13
|
+
# RDF::Normalize.new(g).each_statement do |statement
|
14
|
+
# puts statement.inspect
|
15
|
+
# end
|
16
|
+
#
|
17
|
+
# @example Returning normalized N-Quads
|
18
|
+
#
|
19
|
+
# g = RDF::Graph.load("etc/doap.ttl")
|
20
|
+
# g.dump(:normalize)
|
21
|
+
#
|
22
|
+
# @example Writing a repository as normalized N-Quads
|
23
|
+
#
|
24
|
+
# RDF::Normalize::Writer.open("etc/doap.nq") do |writer|
|
25
|
+
# writer << RDF::Repository.load("etc/doap.ttl")
|
26
|
+
# end
|
27
|
+
#
|
28
|
+
# @author [Gregg Kellogg](http://greggkellogg.net/)
|
29
|
+
module Normalize
|
30
|
+
require 'rdf/normalize/format'
|
31
|
+
require 'rdf/normalize/utils'
|
32
|
+
autoload :Base, 'rdf/normalize/base'
|
33
|
+
autoload :Carroll2001,'rdf/normalize/carroll2001'
|
34
|
+
autoload :URGNA2012, 'rdf/normalize/urgna2012'
|
35
|
+
autoload :URDNA2015, 'rdf/normalize/urdna2015'
|
36
|
+
autoload :VERSION, 'rdf/normalize/version'
|
37
|
+
autoload :Writer, 'rdf/normalize/writer'
|
38
|
+
|
39
|
+
# Enumerable to normalize
|
40
|
+
# @return [RDF::Enumerable]
|
41
|
+
attr_accessor :dataset
|
42
|
+
|
43
|
+
ALGORITHMS = {
|
44
|
+
carroll2001: :Carroll2001,
|
45
|
+
urgna2012: :URGNA2012,
|
46
|
+
urdna2015: :URDNA2015
|
47
|
+
}.freeze
|
48
|
+
|
49
|
+
##
|
50
|
+
# Creates a new normalizer instance using either the specified or default normalizer algorithm
|
51
|
+
# @param [RDF::Enumerable] enumerable
|
52
|
+
# @param [Hash{Symbol => Object}] options
|
53
|
+
# @option options [Base] :algorithm (:urdna2015)
|
54
|
+
# One of `:carroll2001`, `:urgna2012`, or `:urdna2015`
|
55
|
+
# @return [RDF::Normalize::Base]
|
56
|
+
# @raise [ArgumentError] selected algorithm not defined
|
57
|
+
def new(enumerable, options = {})
|
58
|
+
algorithm = options.fetch(:algorithm, :urdna2015)
|
59
|
+
raise ArgumentError, "No algoritm defined for #{algorithm.to_sym}" unless ALGORITHMS.has_key?(algorithm)
|
60
|
+
algorithm_class = const_get(ALGORITHMS[algorithm])
|
61
|
+
algorithm_class.new(enumerable, options)
|
62
|
+
end
|
63
|
+
module_function :new
|
64
|
+
|
65
|
+
end
|
66
|
+
end
|
@@ -0,0 +1,15 @@
|
|
1
|
+
module RDF::Normalize
|
2
|
+
##
|
3
|
+
# Abstract class for pluggable normalization algorithms. Delegates to a default or selected algorithm if instantiated
|
4
|
+
module Base
|
5
|
+
attr_reader :dataset
|
6
|
+
|
7
|
+
# Enumerates normalized statements
|
8
|
+
#
|
9
|
+
# @yield statement
|
10
|
+
# @yieldparam [RDF::Statement] statement
|
11
|
+
def each(&block)
|
12
|
+
raise "Not Implemented"
|
13
|
+
end
|
14
|
+
end
|
15
|
+
end
|
@@ -0,0 +1,166 @@
|
|
1
|
+
module RDF::Normalize
|
2
|
+
class Carroll2001
|
3
|
+
include RDF::Enumerable
|
4
|
+
include Base
|
5
|
+
include Utils
|
6
|
+
|
7
|
+
##
|
8
|
+
# Create an enumerable with grounded nodes
|
9
|
+
#
|
10
|
+
# @param [RDF::Enumerable] enumerable
|
11
|
+
# @return [RDF::Enumerable]
|
12
|
+
def initialize(enumerable, options)
|
13
|
+
@dataset = enumerable
|
14
|
+
end
|
15
|
+
|
16
|
+
def each(&block)
|
17
|
+
ground_statements, anon_statements = [], []
|
18
|
+
dataset.each_statement do |statement|
|
19
|
+
(statement.has_blank_nodes? ? anon_statements : ground_statements) << statement
|
20
|
+
end
|
21
|
+
|
22
|
+
nodes = anon_statements.map(&:to_quad).flatten.compact.select(&:node?).uniq
|
23
|
+
|
24
|
+
# Create a hash signature of every node, based on the signature of
|
25
|
+
# statements it exists in.
|
26
|
+
# We also save hashes of nodes that cannot be reliably known; we will use
|
27
|
+
# that information to eliminate possible recursion combinations.
|
28
|
+
#
|
29
|
+
# Any mappings given in the method parameters are considered grounded.
|
30
|
+
hashes, ungrounded_hashes = hash_nodes(anon_statements, nodes, {})
|
31
|
+
|
32
|
+
# FIXME: likely need to iterate until hashes and ungrounded_hashes are the same size
|
33
|
+
while hashes.size != ungrounded_hashes.size
|
34
|
+
raise "Not done"
|
35
|
+
end
|
36
|
+
|
37
|
+
# Enumerate all statements, replacing nodes with new ground nodes using the hash as an identifier
|
38
|
+
ground_statements.each(&block)
|
39
|
+
anon_statements.each do |statement|
|
40
|
+
quad = statement.to_quad.compact.map do |term|
|
41
|
+
term.node? ? RDF::Node.intern(hashes[term]) : term
|
42
|
+
end
|
43
|
+
block.call RDF::Statement.from(quad)
|
44
|
+
end
|
45
|
+
end
|
46
|
+
|
47
|
+
private
|
48
|
+
|
49
|
+
# Given a set of statements, create a mapping of node => SHA1 for a given
|
50
|
+
# set of blank nodes.
|
51
|
+
#
|
52
|
+
# Returns a tuple of hashes: one of grounded hashes, and one of all
|
53
|
+
# hashes. grounded hashes are based on non-blank nodes and grounded blank
|
54
|
+
# nodes, and can be used to determine if a node's signature matches
|
55
|
+
# another.
|
56
|
+
#
|
57
|
+
# @param [Array] statements
|
58
|
+
# @param [Array] nodes
|
59
|
+
# @param [Hash] grounded_hashes
|
60
|
+
# mapping of node => SHA1 pairs as input, used to create more specific signatures of other nodes.
|
61
|
+
# @private
|
62
|
+
# @return [Hash, Hash]
|
63
|
+
def hash_nodes(statements, nodes, grounded_hashes)
|
64
|
+
hashes = grounded_hashes.dup
|
65
|
+
ungrounded_hashes = {}
|
66
|
+
hash_needed = true
|
67
|
+
|
68
|
+
# We may have to go over the list multiple times. If a node is marked as
|
69
|
+
# grounded, other nodes can then use it to decide their own state of
|
70
|
+
# grounded.
|
71
|
+
while hash_needed
|
72
|
+
starting_grounded_nodes = hashes.size
|
73
|
+
nodes.each do | node |
|
74
|
+
unless hashes.member? node
|
75
|
+
grounded, hash = node_hash_for(node, statements, hashes)
|
76
|
+
if grounded
|
77
|
+
hashes[node] = hash
|
78
|
+
end
|
79
|
+
ungrounded_hashes[node] = hash
|
80
|
+
end
|
81
|
+
end
|
82
|
+
|
83
|
+
# after going over the list, any nodes with a unique hash can be marked
|
84
|
+
# as grounded, even if we have not tied them back to a root yet.
|
85
|
+
uniques = {}
|
86
|
+
ungrounded_hashes.each do |node, hash|
|
87
|
+
uniques[hash] = uniques.has_key?(hash) ? false : node
|
88
|
+
end
|
89
|
+
uniques.each do |hash, node|
|
90
|
+
hashes[node] = hash if node
|
91
|
+
end
|
92
|
+
hash_needed = starting_grounded_nodes != hashes.size
|
93
|
+
end
|
94
|
+
[hashes, ungrounded_hashes]
|
95
|
+
end
|
96
|
+
|
97
|
+
# Generate a hash for a node based on the signature of the statements it
|
98
|
+
# appears in. Signatures consist of grounded elements in statements
|
99
|
+
# associated with a node, that is, anything but an ungrounded anonymous
|
100
|
+
# node. Creating the hash is simply hashing a sorted list of each
|
101
|
+
# statement's signature, which is itself a concatenation of the string form
|
102
|
+
# of all grounded elements.
|
103
|
+
#
|
104
|
+
# Nodes other than the given node are considered grounded if they are a
|
105
|
+
# member in the given hash.
|
106
|
+
#
|
107
|
+
# @param [RDF::Node] node
|
108
|
+
# @param [Array<RDF::Statement>] statements
|
109
|
+
# @param [Hash] hashes
|
110
|
+
# @return [Boolean, String]
|
111
|
+
# a tuple consisting of grounded being true or false and the String for the hash
|
112
|
+
def node_hash_for(node, statements, hashes)
|
113
|
+
statement_signatures = []
|
114
|
+
grounded = true
|
115
|
+
statements.each do | statement |
|
116
|
+
if statement.to_quad.include?(node)
|
117
|
+
statement_signatures << hash_string_for(statement, hashes, node)
|
118
|
+
statement.to_quad.compact.each do | resource |
|
119
|
+
grounded = false unless grounded?(resource, hashes) || resource == node
|
120
|
+
end
|
121
|
+
end
|
122
|
+
end
|
123
|
+
# Note that we sort the signatures--without a canonical ordering,
|
124
|
+
# we might get different hashes for equivalent nodes.
|
125
|
+
[grounded,Digest::SHA1.hexdigest(statement_signatures.sort.to_s)]
|
126
|
+
end
|
127
|
+
|
128
|
+
# Provide a string signature for the given statement, collecting
|
129
|
+
# string signatures for grounded node elements.
|
130
|
+
# @return [String]
|
131
|
+
def hash_string_for(statement, hashes, node)
|
132
|
+
statement.to_quad.map {|r| string_for_node(r, hashes, node)}.join("")
|
133
|
+
end
|
134
|
+
|
135
|
+
# Returns true if a given node is grounded
|
136
|
+
# A node is groundd if it is not a blank node or it is included
|
137
|
+
# in the given mapping of grounded nodes.
|
138
|
+
# @return [Boolean]
|
139
|
+
def grounded?(node, hashes)
|
140
|
+
(!(node.node?)) || (hashes.member? node)
|
141
|
+
end
|
142
|
+
|
143
|
+
# Provides a string for the given node for use in a string signature
|
144
|
+
# Non-anonymous nodes will return their string form. Grounded anonymous
|
145
|
+
# nodes will return their hashed form.
|
146
|
+
# @return [String]
|
147
|
+
def string_for_node(node, hashes, target)
|
148
|
+
case
|
149
|
+
when node.nil?
|
150
|
+
""
|
151
|
+
when node == target
|
152
|
+
"itself"
|
153
|
+
when node.node? && hashes.member?(node)
|
154
|
+
hashes[node]
|
155
|
+
when node.node?
|
156
|
+
"a blank node"
|
157
|
+
# RDF.rb auto-boxing magic makes some literals the same when they
|
158
|
+
# should not be; the ntriples serializer will take care of us
|
159
|
+
when node.literal?
|
160
|
+
node.class.name + RDF::NTriples.serialize(node)
|
161
|
+
else
|
162
|
+
node.to_s
|
163
|
+
end
|
164
|
+
end
|
165
|
+
end
|
166
|
+
end
|
@@ -0,0 +1,264 @@
|
|
1
|
+
module RDF::Normalize
|
2
|
+
class URDNA2015
|
3
|
+
include RDF::Enumerable
|
4
|
+
include Base
|
5
|
+
include Utils
|
6
|
+
|
7
|
+
##
|
8
|
+
# Create an enumerable with grounded nodes
|
9
|
+
#
|
10
|
+
# @param [RDF::Enumerable] enumerable
|
11
|
+
# @return [RDF::Enumerable]
|
12
|
+
def initialize(enumerable, options)
|
13
|
+
@dataset, @options = enumerable, options
|
14
|
+
end
|
15
|
+
|
16
|
+
def each(&block)
|
17
|
+
ns = NormalizationState.new(@options)
|
18
|
+
normalize_statements(ns, &block)
|
19
|
+
end
|
20
|
+
|
21
|
+
protected
|
22
|
+
def normalize_statements(ns, &block)
|
23
|
+
# Map BNodes to the statements they are used by
|
24
|
+
dataset.each_statement do |statement|
|
25
|
+
statement.to_quad.compact.select(&:node?).each do |node|
|
26
|
+
ns.add_statement(node, statement)
|
27
|
+
end
|
28
|
+
end
|
29
|
+
|
30
|
+
non_normalized_identifiers, simple = ns.bnode_to_statements.keys, true
|
31
|
+
|
32
|
+
while simple
|
33
|
+
simple = false
|
34
|
+
ns.hash_to_bnodes = {}
|
35
|
+
|
36
|
+
# Calculate hashes for first degree nodes
|
37
|
+
non_normalized_identifiers.each do |node|
|
38
|
+
hash = depth {ns.hash_first_degree_quads(node)}
|
39
|
+
debug("1deg") {"hash: #{hash}"}
|
40
|
+
ns.add_bnode_hash(node, hash)
|
41
|
+
end
|
42
|
+
|
43
|
+
# Create canonical replacements for hashes mapping to a single node
|
44
|
+
ns.hash_to_bnodes.keys.sort.each do |hash|
|
45
|
+
identifier_list = ns.hash_to_bnodes[hash]
|
46
|
+
next if identifier_list.length > 1
|
47
|
+
node = identifier_list.first
|
48
|
+
id = ns.canonical_issuer.issue_identifier(node)
|
49
|
+
debug("single node") {"node: #{node.to_ntriples}, hash: #{hash}, id: #{id}"}
|
50
|
+
non_normalized_identifiers -= identifier_list
|
51
|
+
ns.hash_to_bnodes.delete(hash)
|
52
|
+
simple = true
|
53
|
+
end
|
54
|
+
end
|
55
|
+
|
56
|
+
# Iterate over hashs having more than one node
|
57
|
+
ns.hash_to_bnodes.keys.sort.each do |hash|
|
58
|
+
identifier_list = ns.hash_to_bnodes[hash]
|
59
|
+
|
60
|
+
debug("multiple nodes") {"node: #{identifier_list.map(&:to_ntriples).join(",")}, hash: #{hash}"}
|
61
|
+
hash_path_list = []
|
62
|
+
|
63
|
+
# Create a hash_path_list for all bnodes using a temporary identifier used to create canonical replacements
|
64
|
+
identifier_list.each do |identifier|
|
65
|
+
next if ns.canonical_issuer.issued.include?(identifier)
|
66
|
+
temporary_issuer = IdentifierIssuer.new("_:b")
|
67
|
+
temporary_issuer.issue_identifier(identifier)
|
68
|
+
hash_path_list << depth {ns.hash_n_degree_quads(identifier, temporary_issuer)}
|
69
|
+
end
|
70
|
+
debug("->") {"hash_path_list: #{hash_path_list.map(&:first).inspect}"}
|
71
|
+
|
72
|
+
# Create canonical replacements for nodes
|
73
|
+
hash_path_list.sort_by(&:first).map(&:last).each do |issuer|
|
74
|
+
issuer.issued.each do |node|
|
75
|
+
id = ns.canonical_issuer.issue_identifier(node)
|
76
|
+
debug("-->") {"node: #{node.to_ntriples}, id: #{id}"}
|
77
|
+
end
|
78
|
+
end
|
79
|
+
end
|
80
|
+
|
81
|
+
# Yield statements using BNodes from canonical replacements
|
82
|
+
dataset.each_statement do |statement|
|
83
|
+
if statement.has_blank_nodes?
|
84
|
+
quad = statement.to_quad.compact.map do |term|
|
85
|
+
term.node? ? RDF::Node.intern(ns.canonical_issuer.identifier(term)[2..-1]) : term
|
86
|
+
end
|
87
|
+
block.call RDF::Statement.from(quad)
|
88
|
+
else
|
89
|
+
block.call statement
|
90
|
+
end
|
91
|
+
end
|
92
|
+
end
|
93
|
+
|
94
|
+
private
|
95
|
+
|
96
|
+
class NormalizationState
|
97
|
+
include Utils
|
98
|
+
|
99
|
+
attr_accessor :bnode_to_statements
|
100
|
+
attr_accessor :hash_to_bnodes
|
101
|
+
attr_accessor :canonical_issuer
|
102
|
+
|
103
|
+
def initialize(options)
|
104
|
+
@options = options
|
105
|
+
@bnode_to_statements, @hash_to_bnodes, @canonical_issuer = {}, {}, IdentifierIssuer.new("_:c14n")
|
106
|
+
end
|
107
|
+
|
108
|
+
def add_statement(node, statement)
|
109
|
+
bnode_to_statements[node] ||= []
|
110
|
+
bnode_to_statements[node] << statement unless bnode_to_statements[node].include?(statement)
|
111
|
+
end
|
112
|
+
|
113
|
+
def add_bnode_hash(node, hash)
|
114
|
+
hash_to_bnodes[hash] ||= []
|
115
|
+
hash_to_bnodes[hash] << node unless hash_to_bnodes[hash].include?(node)
|
116
|
+
end
|
117
|
+
|
118
|
+
# @param [RDF::Node] node
|
119
|
+
# @return [String] the SHA1 hexdigest hash of statements using this node, with replacements
|
120
|
+
def hash_first_degree_quads(node)
|
121
|
+
quads = bnode_to_statements[node].
|
122
|
+
map do |statement|
|
123
|
+
quad = statement.to_quad.map do |t|
|
124
|
+
case t
|
125
|
+
when node then RDF::Node("a")
|
126
|
+
when RDF::Node then RDF::Node("z")
|
127
|
+
else t
|
128
|
+
end
|
129
|
+
end
|
130
|
+
RDF::NQuads::Writer.serialize(RDF::Statement.from(quad))
|
131
|
+
end
|
132
|
+
|
133
|
+
debug("1deg") {"node: #{node}, quads: #{quads}"}
|
134
|
+
hexdigest(quads.sort.join)
|
135
|
+
end
|
136
|
+
|
137
|
+
# @param [RDF::Node] related
|
138
|
+
# @param [RDF::Statement] statement
|
139
|
+
# @param [IdentifierIssuer] issuer
|
140
|
+
# @param [String] position one of :s, :o, or :g
|
141
|
+
# @return [String] the SHA1 hexdigest hash
|
142
|
+
def hash_related_node(related, statement, issuer, position)
|
143
|
+
identifier = canonical_issuer.identifier(related) ||
|
144
|
+
issuer.identifier(related) ||
|
145
|
+
hash_first_degree_quads(related)
|
146
|
+
input = position.to_s
|
147
|
+
input << statement.predicate.to_ntriples unless position == :g
|
148
|
+
input << identifier
|
149
|
+
debug("hrel") {"input: #{input.inspect}, hash: #{hexdigest(input)}"}
|
150
|
+
hexdigest(input)
|
151
|
+
end
|
152
|
+
|
153
|
+
# @param [RDF::Node] identifier
|
154
|
+
# @param [IdentifierIssuer] issuer
|
155
|
+
# @return [Array<String,IdentifierIssuer>] the Hash and issuer
|
156
|
+
def hash_n_degree_quads(identifier, issuer)
|
157
|
+
debug("ndeg") {"identifier: #{identifier.to_ntriples}"}
|
158
|
+
|
159
|
+
# hash to related blank nodes map
|
160
|
+
map = {}
|
161
|
+
|
162
|
+
bnode_to_statements[identifier].each do |statement|
|
163
|
+
hash_related_statement(identifier, statement, issuer, map)
|
164
|
+
end
|
165
|
+
|
166
|
+
data_to_hash = ""
|
167
|
+
|
168
|
+
debug("ndeg") {"map: #{map.map {|h,l| "#{h}: #{l.map(&:to_ntriples)}"}.join('; ')}"}
|
169
|
+
depth do
|
170
|
+
map.keys.sort.each do |hash|
|
171
|
+
list = map[hash]
|
172
|
+
# Iterate over related nodes
|
173
|
+
chosen_path, chosen_issuer = "", nil
|
174
|
+
data_to_hash += hash
|
175
|
+
|
176
|
+
list.permutation do |permutation|
|
177
|
+
debug("ndeg") {"perm: #{permutation.map(&:to_ntriples).join(",")}"}
|
178
|
+
issuer_copy, path, recursion_list = issuer.dup, "", []
|
179
|
+
|
180
|
+
permutation.each do |related|
|
181
|
+
if canonical_issuer.identifier(related)
|
182
|
+
path << canonical_issuer.issue_identifier(related)
|
183
|
+
else
|
184
|
+
recursion_list << related if !issuer_copy.identifier(related)
|
185
|
+
path << issuer_copy.issue_identifier(related)
|
186
|
+
end
|
187
|
+
|
188
|
+
# Skip to the next permutation if chosen path isn't empty and the path is greater than the chosen path
|
189
|
+
break if !chosen_path.empty? && path.length >= chosen_path.length
|
190
|
+
end
|
191
|
+
debug("ndeg") {"hash: #{hash}, path: #{path}, recursion: #{recursion_list.map(&:to_ntriples)}"}
|
192
|
+
|
193
|
+
recursion_list.each do |related|
|
194
|
+
result = depth {hash_n_degree_quads(related, issuer_copy)}
|
195
|
+
path << issuer_copy.issue_identifier(related)
|
196
|
+
path << "<#{result.first}>"
|
197
|
+
issuer_copy = result.last
|
198
|
+
break if !chosen_path.empty? && path.length >= chosen_path.length && path > chosen_path
|
199
|
+
end
|
200
|
+
|
201
|
+
if chosen_path.empty? || path < chosen_path
|
202
|
+
chosen_path, chosen_issuer = path, issuer_copy
|
203
|
+
end
|
204
|
+
end
|
205
|
+
|
206
|
+
data_to_hash += chosen_path
|
207
|
+
issuer = chosen_issuer
|
208
|
+
end
|
209
|
+
end
|
210
|
+
|
211
|
+
debug("ndeg") {"datatohash: #{data_to_hash.inspect}, hash: #{hexdigest(data_to_hash)}"}
|
212
|
+
return [hexdigest(data_to_hash), issuer]
|
213
|
+
end
|
214
|
+
|
215
|
+
protected
|
216
|
+
|
217
|
+
# FIXME: should be SHA-256.
|
218
|
+
def hexdigest(val)
|
219
|
+
Digest::SHA1.hexdigest(val)
|
220
|
+
end
|
221
|
+
|
222
|
+
# Group adjacent bnodes by hash
|
223
|
+
def hash_related_statement(identifier, statement, issuer, map)
|
224
|
+
statement.to_hash(:s, :p, :o, :g).each do |pos, term|
|
225
|
+
next if !term.is_a?(RDF::Node) || term == identifier
|
226
|
+
|
227
|
+
hash = depth {hash_related_node(term, statement, issuer, pos)}
|
228
|
+
map[hash] ||= []
|
229
|
+
map[hash] << term unless map[hash].include?(term)
|
230
|
+
end
|
231
|
+
end
|
232
|
+
end
|
233
|
+
|
234
|
+
class IdentifierIssuer
|
235
|
+
def initialize(prefix = "_:c14n")
|
236
|
+
@prefix, @counter, @issued = prefix, 0, {}
|
237
|
+
end
|
238
|
+
|
239
|
+
# Return an identifier for this BNode
|
240
|
+
def issue_identifier(node)
|
241
|
+
@issued[node] ||= begin
|
242
|
+
res, @counter = @prefix + @counter.to_s, @counter + 1
|
243
|
+
res
|
244
|
+
end
|
245
|
+
end
|
246
|
+
|
247
|
+
def issued
|
248
|
+
@issued.keys
|
249
|
+
end
|
250
|
+
|
251
|
+
def identifier(node)
|
252
|
+
@issued[node]
|
253
|
+
end
|
254
|
+
|
255
|
+
# Duplicate this issuer, ensuring that the issued identifiers remain distinct
|
256
|
+
# @return [IdentifierIssuer]
|
257
|
+
def dup
|
258
|
+
other = super
|
259
|
+
other.instance_variable_set(:@issued, @issued.dup)
|
260
|
+
other
|
261
|
+
end
|
262
|
+
end
|
263
|
+
end
|
264
|
+
end
|
@@ -0,0 +1,47 @@
|
|
1
|
+
module RDF::Normalize
|
2
|
+
class URGNA2012 < URDNA2015
|
3
|
+
|
4
|
+
def each(&block)
|
5
|
+
ns = NormalizationState.new(@options)
|
6
|
+
normalize_statements(ns, &block)
|
7
|
+
end
|
8
|
+
|
9
|
+
class NormalizationState < URDNA2015::NormalizationState
|
10
|
+
protected
|
11
|
+
|
12
|
+
# 2012 version uses SHA-1
|
13
|
+
def hexdigest(val)
|
14
|
+
Digest::SHA1.hexdigest(val)
|
15
|
+
end
|
16
|
+
|
17
|
+
# @param [RDF::Node] related
|
18
|
+
# @param [RDF::Statement] statement
|
19
|
+
# @param [IdentifierIssuer] issuer
|
20
|
+
# @param [String] position one of :s, :o, or :g
|
21
|
+
# @return [String] the SHA1 hexdigest hash
|
22
|
+
def hash_related_node(related, statement, issuer, position)
|
23
|
+
identifier = canonical_issuer.identifier(related) ||
|
24
|
+
issuer.identifier(related) ||
|
25
|
+
hash_first_degree_quads(related)
|
26
|
+
input = position.to_s
|
27
|
+
input << statement.predicate.to_s
|
28
|
+
input << identifier
|
29
|
+
debug("hrel") {"input: #{input.inspect}, hash: #{hexdigest(input)}"}
|
30
|
+
hexdigest(input)
|
31
|
+
end
|
32
|
+
|
33
|
+
# In URGNA2012, the position parameter passed to the Hash Related Blank Node algorithm was instead modeled as a direction parameter, where it could have the value p, for property, when the related blank node was a `subject` and the value r, for reverse or reference, when the related blank node was an `object`. Since URGNA2012 only normalized graphs, not datasets, there was no use of the `graph` position.
|
34
|
+
def hash_related_statement(identifier, statement, issuer, map)
|
35
|
+
if statement.subject.node? && statement.subject != identifier
|
36
|
+
hash = depth {hash_related_node(statement.subject, statement, issuer, :p)}
|
37
|
+
map[hash] ||= []
|
38
|
+
map[hash] << statement.subject unless map[hash].include?(statement.subject)
|
39
|
+
elsif statement.object.node? && statement.object != identifier
|
40
|
+
hash = depth {hash_related_node(statement.object, statement, issuer, :r)}
|
41
|
+
map[hash] ||= []
|
42
|
+
map[hash] << statement.object unless map[hash].include?(statement.object)
|
43
|
+
end
|
44
|
+
end
|
45
|
+
end
|
46
|
+
end
|
47
|
+
end
|
@@ -0,0 +1,33 @@
|
|
1
|
+
module RDF::Normalize
|
2
|
+
module Utils
|
3
|
+
# Add debug event to debug array, if specified
|
4
|
+
#
|
5
|
+
# param [String] message
|
6
|
+
# yieldreturn [String] appended to message, to allow for lazy-evaulation of message
|
7
|
+
def debug(*args)
|
8
|
+
options = args.last.is_a?(Hash) ? args.pop : {}
|
9
|
+
return unless options[:debug] || @options[:debug]
|
10
|
+
depth = options[:depth] || @options[:depth]
|
11
|
+
d_str = depth > 100 ? ' ' * 100 + '+' : ' ' * depth
|
12
|
+
list = args
|
13
|
+
list << yield if block_given?
|
14
|
+
message = d_str + (list.empty? ? "" : list.join(": "))
|
15
|
+
options[:debug] << message if options[:debug].is_a?(Array)
|
16
|
+
@options[:debug] << message if @options[:debug].is_a?(Array)
|
17
|
+
$stderr.puts(message) if @options[:debug] == TrueClass
|
18
|
+
end
|
19
|
+
module_function :debug
|
20
|
+
|
21
|
+
# Increase depth around a method invocation
|
22
|
+
# @yield
|
23
|
+
# Yields with no arguments
|
24
|
+
# @yieldreturn [Object] returns the result of yielding
|
25
|
+
# @return [Object]
|
26
|
+
def depth
|
27
|
+
@options[:depth] += 1
|
28
|
+
ret = yield
|
29
|
+
@options[:depth] -= 1
|
30
|
+
ret
|
31
|
+
end
|
32
|
+
end
|
33
|
+
end
|
@@ -0,0 +1,79 @@
|
|
1
|
+
module RDF::Normalize
|
2
|
+
##
|
3
|
+
# A RDF Graph normalization serialiser.
|
4
|
+
#
|
5
|
+
# Normalizes the enumerated statements into normal form in the form of N-Quads.
|
6
|
+
#
|
7
|
+
# @author [Gregg Kellogg](http://kellogg-assoc.com/)
|
8
|
+
class Writer < RDF::NQuads::Writer
|
9
|
+
format RDF::Normalize::Format
|
10
|
+
|
11
|
+
# @attr_accessor [RDF::Repository] Repository of statements to serialized
|
12
|
+
attr_accessor :repo
|
13
|
+
|
14
|
+
##
|
15
|
+
# Initializes the writer instance.
|
16
|
+
#
|
17
|
+
# @param [IO, File] output
|
18
|
+
# the output stream
|
19
|
+
# @param [Hash{Symbol => Object}] options
|
20
|
+
# any additional options
|
21
|
+
# @yield [writer] `self`
|
22
|
+
# @yieldparam [RDF::Writer] writer
|
23
|
+
# @yieldreturn [void]
|
24
|
+
# @yield [writer]
|
25
|
+
# @yieldparam [RDF::Writer] writer
|
26
|
+
def initialize(output = $stdout, options = {}, &block)
|
27
|
+
super do
|
28
|
+
@options[:depth] ||= 0
|
29
|
+
@repo = RDF::Repository.new
|
30
|
+
if block_given?
|
31
|
+
case block.arity
|
32
|
+
when 0 then instance_eval(&block)
|
33
|
+
else block.call(self)
|
34
|
+
end
|
35
|
+
end
|
36
|
+
end
|
37
|
+
end
|
38
|
+
|
39
|
+
##
|
40
|
+
# Defer writing to epilogue
|
41
|
+
def write_statement(statement)
|
42
|
+
self
|
43
|
+
end
|
44
|
+
|
45
|
+
##
|
46
|
+
# Outputs the Graph representation of all stored triples.
|
47
|
+
#
|
48
|
+
# @return [void]
|
49
|
+
def write_epilogue
|
50
|
+
statements = RDF::Normalize.new(@repo, @options).
|
51
|
+
statements.
|
52
|
+
reject(&:variable?).
|
53
|
+
map {|s| format_statement(s)}.
|
54
|
+
sort.
|
55
|
+
each do |line|
|
56
|
+
puts line
|
57
|
+
end
|
58
|
+
end
|
59
|
+
|
60
|
+
protected
|
61
|
+
|
62
|
+
##
|
63
|
+
# Adds a statement to be serialized
|
64
|
+
# @param [RDF::Statement] statement
|
65
|
+
# @return [void]
|
66
|
+
def insert_statement(statement)
|
67
|
+
@repo.insert(statement)
|
68
|
+
end
|
69
|
+
|
70
|
+
##
|
71
|
+
# Insert an Enumerable
|
72
|
+
#
|
73
|
+
# @param [RDF::Enumerable] graph
|
74
|
+
# @return [void]
|
75
|
+
def insert_statements(enumerable)
|
76
|
+
@repo = enumerable
|
77
|
+
end
|
78
|
+
end
|
79
|
+
end
|
metadata
ADDED
@@ -0,0 +1,160 @@
|
|
1
|
+
--- !ruby/object:Gem::Specification
|
2
|
+
name: rdf-normalize
|
3
|
+
version: !ruby/object:Gem::Version
|
4
|
+
version: 0.1.0
|
5
|
+
platform: ruby
|
6
|
+
authors:
|
7
|
+
- Gregg Kellogg
|
8
|
+
autorequire:
|
9
|
+
bindir: bin
|
10
|
+
cert_chain: []
|
11
|
+
date: 2015-05-20 00:00:00.000000000 Z
|
12
|
+
dependencies:
|
13
|
+
- !ruby/object:Gem::Dependency
|
14
|
+
name: rdf
|
15
|
+
requirement: !ruby/object:Gem::Requirement
|
16
|
+
requirements:
|
17
|
+
- - "~>"
|
18
|
+
- !ruby/object:Gem::Version
|
19
|
+
version: '1.1'
|
20
|
+
type: :runtime
|
21
|
+
prerelease: false
|
22
|
+
version_requirements: !ruby/object:Gem::Requirement
|
23
|
+
requirements:
|
24
|
+
- - "~>"
|
25
|
+
- !ruby/object:Gem::Version
|
26
|
+
version: '1.1'
|
27
|
+
- !ruby/object:Gem::Dependency
|
28
|
+
name: rdf-spec
|
29
|
+
requirement: !ruby/object:Gem::Requirement
|
30
|
+
requirements:
|
31
|
+
- - "~>"
|
32
|
+
- !ruby/object:Gem::Version
|
33
|
+
version: '1.1'
|
34
|
+
type: :development
|
35
|
+
prerelease: false
|
36
|
+
version_requirements: !ruby/object:Gem::Requirement
|
37
|
+
requirements:
|
38
|
+
- - "~>"
|
39
|
+
- !ruby/object:Gem::Version
|
40
|
+
version: '1.1'
|
41
|
+
- !ruby/object:Gem::Dependency
|
42
|
+
name: open-uri-cached
|
43
|
+
requirement: !ruby/object:Gem::Requirement
|
44
|
+
requirements:
|
45
|
+
- - "~>"
|
46
|
+
- !ruby/object:Gem::Version
|
47
|
+
version: '0.0'
|
48
|
+
- - ">="
|
49
|
+
- !ruby/object:Gem::Version
|
50
|
+
version: 0.0.5
|
51
|
+
type: :development
|
52
|
+
prerelease: false
|
53
|
+
version_requirements: !ruby/object:Gem::Requirement
|
54
|
+
requirements:
|
55
|
+
- - "~>"
|
56
|
+
- !ruby/object:Gem::Version
|
57
|
+
version: '0.0'
|
58
|
+
- - ">="
|
59
|
+
- !ruby/object:Gem::Version
|
60
|
+
version: 0.0.5
|
61
|
+
- !ruby/object:Gem::Dependency
|
62
|
+
name: rspec
|
63
|
+
requirement: !ruby/object:Gem::Requirement
|
64
|
+
requirements:
|
65
|
+
- - "~>"
|
66
|
+
- !ruby/object:Gem::Version
|
67
|
+
version: '3.2'
|
68
|
+
type: :development
|
69
|
+
prerelease: false
|
70
|
+
version_requirements: !ruby/object:Gem::Requirement
|
71
|
+
requirements:
|
72
|
+
- - "~>"
|
73
|
+
- !ruby/object:Gem::Version
|
74
|
+
version: '3.2'
|
75
|
+
- !ruby/object:Gem::Dependency
|
76
|
+
name: webmock
|
77
|
+
requirement: !ruby/object:Gem::Requirement
|
78
|
+
requirements:
|
79
|
+
- - "~>"
|
80
|
+
- !ruby/object:Gem::Version
|
81
|
+
version: '1.17'
|
82
|
+
type: :development
|
83
|
+
prerelease: false
|
84
|
+
version_requirements: !ruby/object:Gem::Requirement
|
85
|
+
requirements:
|
86
|
+
- - "~>"
|
87
|
+
- !ruby/object:Gem::Version
|
88
|
+
version: '1.17'
|
89
|
+
- !ruby/object:Gem::Dependency
|
90
|
+
name: json-ld
|
91
|
+
requirement: !ruby/object:Gem::Requirement
|
92
|
+
requirements:
|
93
|
+
- - "~>"
|
94
|
+
- !ruby/object:Gem::Version
|
95
|
+
version: '1.1'
|
96
|
+
type: :development
|
97
|
+
prerelease: false
|
98
|
+
version_requirements: !ruby/object:Gem::Requirement
|
99
|
+
requirements:
|
100
|
+
- - "~>"
|
101
|
+
- !ruby/object:Gem::Version
|
102
|
+
version: '1.1'
|
103
|
+
- !ruby/object:Gem::Dependency
|
104
|
+
name: yard
|
105
|
+
requirement: !ruby/object:Gem::Requirement
|
106
|
+
requirements:
|
107
|
+
- - "~>"
|
108
|
+
- !ruby/object:Gem::Version
|
109
|
+
version: '0.8'
|
110
|
+
type: :development
|
111
|
+
prerelease: false
|
112
|
+
version_requirements: !ruby/object:Gem::Requirement
|
113
|
+
requirements:
|
114
|
+
- - "~>"
|
115
|
+
- !ruby/object:Gem::Version
|
116
|
+
version: '0.8'
|
117
|
+
description: RDF::Normalize is a Graph normalizer for the RDF.rb library suite.
|
118
|
+
email: public-rdf-ruby@w3.org
|
119
|
+
executables: []
|
120
|
+
extensions: []
|
121
|
+
extra_rdoc_files: []
|
122
|
+
files:
|
123
|
+
- AUTHORS
|
124
|
+
- LICENSE
|
125
|
+
- README.md
|
126
|
+
- VERSION
|
127
|
+
- lib/rdf/normalize.rb
|
128
|
+
- lib/rdf/normalize/base.rb
|
129
|
+
- lib/rdf/normalize/carroll2001.rb
|
130
|
+
- lib/rdf/normalize/format.rb
|
131
|
+
- lib/rdf/normalize/urdna2015.rb
|
132
|
+
- lib/rdf/normalize/urgna2012.rb
|
133
|
+
- lib/rdf/normalize/utils.rb
|
134
|
+
- lib/rdf/normalize/writer.rb
|
135
|
+
homepage: http://github.com/gkellogg/rdf-normalize
|
136
|
+
licenses:
|
137
|
+
- Public Domain
|
138
|
+
metadata: {}
|
139
|
+
post_install_message:
|
140
|
+
rdoc_options: []
|
141
|
+
require_paths:
|
142
|
+
- lib
|
143
|
+
required_ruby_version: !ruby/object:Gem::Requirement
|
144
|
+
requirements:
|
145
|
+
- - ">="
|
146
|
+
- !ruby/object:Gem::Version
|
147
|
+
version: 1.9.2
|
148
|
+
required_rubygems_version: !ruby/object:Gem::Requirement
|
149
|
+
requirements:
|
150
|
+
- - ">="
|
151
|
+
- !ruby/object:Gem::Version
|
152
|
+
version: '0'
|
153
|
+
requirements: []
|
154
|
+
rubyforge_project: rdf-normalize
|
155
|
+
rubygems_version: 2.4.7
|
156
|
+
signing_key:
|
157
|
+
specification_version: 4
|
158
|
+
summary: RDF Graph normalizer for Ruby.
|
159
|
+
test_files: []
|
160
|
+
has_rdoc: false
|