graffiti 2.1
Sign up to get free protection for your applications and to get access to all the features.
- data/COPYING +676 -0
- data/ChangeLog.mtn +233 -0
- data/README.rdoc +129 -0
- data/TODO +30 -0
- data/doc/diagrams/graffiti-classes.svg +157 -0
- data/doc/diagrams/graffiti-deployment.svg +117 -0
- data/doc/diagrams/graffiti-store-sequence.svg +69 -0
- data/doc/diagrams/squish-select-sequence.svg +266 -0
- data/doc/examples/samizdat-rdf-config.yaml +77 -0
- data/doc/examples/samizdat-triggers-pgsql.sql +266 -0
- data/doc/papers/collreif.tex +462 -0
- data/doc/papers/rdf-to-relational-query-translation-icis2009.tex +936 -0
- data/doc/papers/rel-rdf.tex +545 -0
- data/doc/rdf-impl-report.txt +126 -0
- data/graffiti.gemspec +21 -0
- data/lib/graffiti.rb +15 -0
- data/lib/graffiti/debug.rb +34 -0
- data/lib/graffiti/exceptions.rb +20 -0
- data/lib/graffiti/rdf_config.rb +78 -0
- data/lib/graffiti/rdf_property_map.rb +92 -0
- data/lib/graffiti/sql_mapper.rb +916 -0
- data/lib/graffiti/squish.rb +568 -0
- data/lib/graffiti/store.rb +100 -0
- data/setup.rb +1360 -0
- data/test/ts_graffiti.rb +455 -0
- metadata +122 -0
@@ -0,0 +1,936 @@
|
|
1
|
+
\documentclass[conference,letterpaper]{IEEEtran}
|
2
|
+
\usepackage{graphicx}
|
3
|
+
%\usepackage{multirow}
|
4
|
+
%\usepackage{ragged2e}
|
5
|
+
\usepackage{algpseudocode}
|
6
|
+
\usepackage[cmex10]{amsmath}
|
7
|
+
\usepackage{amsfonts}
|
8
|
+
\usepackage{amssymb}
|
9
|
+
\usepackage{fancyvrb}
|
10
|
+
\usepackage{pstricks,pst-node}
|
11
|
+
\usepackage[pdftitle={On-demand RDF to Relational Query Translation in
|
12
|
+
Samizdat RDF Store},
|
13
|
+
pdfauthor={Dmitry Borodaenko},
|
14
|
+
pdfkeywords={Semantic Web, RDF, relational databases, query
|
15
|
+
language, Samizdat},
|
16
|
+
pdfborder={0 0 0}]{hyperref}
|
17
|
+
%\urlstyle{rm}
|
18
|
+
\emergencystretch=8pt
|
19
|
+
\interdisplaylinepenalty=2500
|
20
|
+
%
|
21
|
+
\begin{document}
|
22
|
+
%
|
23
|
+
\title{On-demand RDF to Relational Query Translation in Samizdat RDF
|
24
|
+
Store}
|
25
|
+
%
|
26
|
+
\author{\IEEEauthorblockN{Dmitry Borodaenko}
|
27
|
+
\IEEEauthorblockA{Belarusian State University of Informatics and
|
28
|
+
Radioelectronics\\
|
29
|
+
6 Brovki st., Minsk, Belarus\\
|
30
|
+
Email: angdraug@debian.org}}
|
31
|
+
|
32
|
+
\maketitle % typeset the title of the contribution
|
33
|
+
|
34
|
+
\begin{abstract}
|
35
|
+
|
36
|
+
This paper presents an algorithm for on-demand translation of RDF
|
37
|
+
queries that allows to map any relational data structure to RDF model,
|
38
|
+
and to perform queries over a combination of mapped relational data and
|
39
|
+
arbitrary RDF triples with a performance comparable to that of
|
40
|
+
relational systems. Query capabilities implemented by the algorithm
|
41
|
+
include optional and negative graph patterns, nested sub-patterns, and
|
42
|
+
limited RDFS and OWL inference backed by database triggers.
|
43
|
+
|
44
|
+
\end{abstract}
|
45
|
+
|
46
|
+
|
47
|
+
\section{Introduction}
|
48
|
+
\label{introduction}
|
49
|
+
|
50
|
+
% motivation for the proposed solution
|
51
|
+
|
52
|
+
A wide range of solutions that map relational data to RDF data model has
|
53
|
+
accumulated to date~\cite{triplify}. There are several factors that make
|
54
|
+
integration of RDF and relational data important for the adoption of the
|
55
|
+
Semantic Web. One reason, shared with RDF stores based on a triples
|
56
|
+
table, is the wide availability of mature relational database
|
57
|
+
implementations which had seen decades of improvements in reliability,
|
58
|
+
scalability, and performance. Second is the fact that most of structured
|
59
|
+
data available online is backed by relational databases. This data is
|
60
|
+
not likely to be replaced by pure RDF stores in the near future, so it
|
61
|
+
has to be mapped in one way or another to become available to RDF
|
62
|
+
agents. Finally, properly normalized and indexed application-specific
|
63
|
+
relational database schema allows a DBMS to optimize complex queries in
|
64
|
+
ways that are not possible for a tree of joins over a single triples
|
65
|
+
table~\cite{sp2b}.
|
66
|
+
|
67
|
+
% what is unique about the proposed solution
|
68
|
+
|
69
|
+
In the Samizdat open publishing engine, most of the data fits into the
|
70
|
+
relational model, with the exception of reified RDF statements which are
|
71
|
+
used in collaborative decision making process~\cite{samizdat-collreif}
|
72
|
+
and require a more generic triple store. The need for a generic RDF
|
73
|
+
store with performance on par with a relational database is the primary
|
74
|
+
motivation behind the design of Samizdat RDF storage module, which is
|
75
|
+
different from both triples table based RDF stores and relational to RDF
|
76
|
+
mapping systems. Unlike the former, Samizdat can run optimized SQL
|
77
|
+
queries over application-specific tables, but unlike the latter, it is
|
78
|
+
not limited by the relational database schema and can fall back, within
|
79
|
+
the same query, to a triples table for RDF predicates that are not
|
80
|
+
mapped to the relational model.
|
81
|
+
|
82
|
+
% structure of the paper
|
83
|
+
|
84
|
+
The following sections of this paper describe: targeted relational data,
|
85
|
+
database triggers required for RDFS and OWL inference, query translation
|
86
|
+
algorithm, update request execution algorithm, details of algorithm
|
87
|
+
implementation in Samizdat, analysis of its performance, comparison with
|
88
|
+
related work, and outline for future work.
|
89
|
+
|
90
|
+
|
91
|
+
\section{Relational Data}
|
92
|
+
\label{relational-data}
|
93
|
+
|
94
|
+
% formal definition of data targeted for storage
|
95
|
+
|
96
|
+
Samizdat RDF storage module does not impose additional restrictions on
|
97
|
+
the underlying relational database schema beyond the requirements of the
|
98
|
+
SQL standard. Any legacy database may be adapted for RDF access while
|
99
|
+
retaining backwards compatibility with existing SQL queries.
|
100
|
+
|
101
|
+
The adaptation process involves adding attributes, foreign keys, tables,
|
102
|
+
and triggers to the database to enable RDF query translation and support
|
103
|
+
optional features of Samizdat RDF store, such as statement reification
|
104
|
+
and inference for {\em rdfs:sub\-Class\-Of}, {\em
|
105
|
+
rdfs:sub\-Property\-Of}, and {\em owl:Transitive\-Property\/} rules.
|
106
|
+
|
107
|
+
Following database schema changes are required for all cases:
|
108
|
+
|
109
|
+
\begin{itemize}
|
110
|
+
|
111
|
+
\item create {\em rdfs:Resource\/} superclass table with autogenerated
|
112
|
+
primary key;
|
113
|
+
|
114
|
+
\item replace primary keys of mapped subclass tables with foreign keys
|
115
|
+
referencing the {\em rdfs:Resource\/} table (existing foreign keys may
|
116
|
+
need to be updated to reflect this change);
|
117
|
+
|
118
|
+
\item register {\em rdfs:subClassOf\/} inference database triggers to
|
119
|
+
update the Resource table and maintain foreign keys integrity on all
|
120
|
+
changes in mapped subclass tables.
|
121
|
+
|
122
|
+
\end{itemize}
|
123
|
+
|
124
|
+
Following changes may be necessary to support optional RDF mapping
|
125
|
+
features:
|
126
|
+
|
127
|
+
\begin{itemize}
|
128
|
+
|
129
|
+
\item register database triggers for other cases of {\em
|
130
|
+
rdfs:sub\-Class\-Of\/} entailment;
|
131
|
+
|
132
|
+
\item create triples table (required to represent non-relational RDF
|
133
|
+
data and RDF statement reification);
|
134
|
+
|
135
|
+
\item add subproperty qualifier attributes referencing property URIref
|
136
|
+
entry in the {\em rdfs:Resource\/} table for each attribute mapped to a
|
137
|
+
superproperty;
|
138
|
+
|
139
|
+
\item create transitive closure tables, register {\em
|
140
|
+
owl:TransitivePro\-perty\/} inference triggers.
|
141
|
+
|
142
|
+
\end{itemize}
|
143
|
+
|
144
|
+
|
145
|
+
\section{Inference and Database Triggers}
|
146
|
+
\label{inference-triggers}
|
147
|
+
|
148
|
+
Samizdat RDF storage module implements entailment rules for following
|
149
|
+
RDFS predicates and OWL classes: {\em rdfs:sub\-Class\-Of}, {\em
|
150
|
+
rdfs:sub\-Property\-Of}, {\em owl:Transitive\-Property}. Database
|
151
|
+
triggers are used to minimize impact of RDFS and OWL inference on query
|
152
|
+
performance:
|
153
|
+
|
154
|
+
{\em rdfs:subClassOf\/} inference triggers are invoked on every insert
|
155
|
+
into and delete from a subclass table. When a tuple without a primary
|
156
|
+
key is inserted,\footnote{Insertion into subclass table with explicit
|
157
|
+
primary key is used in two-step resource insertion during execution of
|
158
|
+
RDF update command (described in section~\ref{update-execution}).} a
|
159
|
+
template tuple is inserted into superclass table and the produced
|
160
|
+
primary key is added to the new subclass tuple. Delete operation is
|
161
|
+
cascaded to all subclass and superclass tables.
|
162
|
+
|
163
|
+
{\em rdfs:subPropertyOf\/} inference is performed during query
|
164
|
+
translation, with help of a stored procedure that returns the attribute
|
165
|
+
value when subproperty qualifier attribute is set, and NULL otherwise.
|
166
|
+
|
167
|
+
{\em owl:TransitiveProperty\/} inference uses a separate transitive
|
168
|
+
closure table for each relational attribute mapped to a transitive
|
169
|
+
property. Transitive closure tables are maintained by triggers invoked
|
170
|
+
on each insert, update, and delete operation involving such an
|
171
|
+
attribute.
|
172
|
+
|
173
|
+
The transitive closure update algorithm is presented in
|
174
|
+
\figurename~\ref{transitive-closure}. The input to the algorithm is:
|
175
|
+
|
176
|
+
\begin{itemize}
|
177
|
+
|
178
|
+
\item directed labeled graph $G = \langle N, A \rangle$ where $N$ is a
|
179
|
+
set of nodes representing RDF resources and $A$ is a set of arcs $a =
|
180
|
+
\langle s, p, o \rangle$ representing RDF triples;
|
181
|
+
|
182
|
+
\item transitive property $\tau$;
|
183
|
+
|
184
|
+
\item subgraph $G_\tau \subseteq G$ such that:
|
185
|
+
|
186
|
+
\begin{equation}
|
187
|
+
a_\tau = \langle s, p, o \rangle \in G_\tau \iff
|
188
|
+
a_\tau \in G \, \wedge \, p = \tau \, ;
|
189
|
+
\end{equation}
|
190
|
+
|
191
|
+
\item graph $G_\tau^+$ containing transitive closure of $G_\tau$;
|
192
|
+
|
193
|
+
\item update operation $\omega \in \{insert, update, delete\}$ and its
|
194
|
+
parameters $a_{old} = \langle s_\omega, \tau, o_{old} \rangle$, $a_{new}
|
195
|
+
= \langle s_\omega, \tau, o_{new} \rangle$ such that:
|
196
|
+
|
197
|
+
\begin{equation}
|
198
|
+
G_\tau' = (G_\tau \setminus \{ a_{old} \}) \cup \{ a_{new} \} \, .
|
199
|
+
\end{equation}
|
200
|
+
|
201
|
+
\end{itemize}
|
202
|
+
|
203
|
+
The algorithm transforms $G_\tau^+$ into a transitive closure of
|
204
|
+
$G_\tau'$. The algorithm assumes that $G_\tau$ is and should remain
|
205
|
+
acyclic.
|
206
|
+
|
207
|
+
\begin{figure}
|
208
|
+
\begin{algorithmic}[1]
|
209
|
+
|
210
|
+
\If {$o_{new} = s_\omega$ or $\langle o_{new}, \tau, s_\omega \rangle \in G_\tau^+$}
|
211
|
+
\State stop
|
212
|
+
\Comment refuse to create a cycle in $G_\tau$
|
213
|
+
\EndIf
|
214
|
+
|
215
|
+
\State $G_\tau \gets G_\tau'$
|
216
|
+
\Comment apply $\omega$
|
217
|
+
|
218
|
+
\If {$\omega \in \{update, delete\}$}
|
219
|
+
\State $G_\tau^+ \gets G_\tau^+ \setminus
|
220
|
+
\{ \langle s, \tau, o \rangle \mid
|
221
|
+
(s = s_\omega \, \vee \,
|
222
|
+
\langle s, \tau, s_\omega \rangle \in G_\tau^+) \, \wedge \,
|
223
|
+
\langle s_\omega, \tau, o \rangle \in G_\tau^+ \}$
|
224
|
+
\Comment remove obsolete arcs from $G_\tau^+$
|
225
|
+
\EndIf
|
226
|
+
|
227
|
+
\If {$\omega \in \{insert, update\}$}
|
228
|
+
\Comment add new arcs to $G_\tau^+$
|
229
|
+
|
230
|
+
\State $G_\tau^+ \gets G_\tau^+ \cup
|
231
|
+
\{ \langle s_\omega, \tau, o \rangle \mid
|
232
|
+
o = o_{new} \, \vee \,
|
233
|
+
\langle o_{new}, \tau, o \rangle \in G_\tau^+ \}$
|
234
|
+
|
235
|
+
\State $G_\tau^+ \gets G_\tau^+ \cup
|
236
|
+
\{ \langle s, \tau, o \rangle \mid
|
237
|
+
\langle s, \tau, s_\omega \rangle \in G_\tau^+ \, \wedge \,
|
238
|
+
\langle s_\omega, \tau, o \rangle \in G_\tau^+ \}$
|
239
|
+
\EndIf
|
240
|
+
|
241
|
+
\end{algorithmic}
|
242
|
+
\caption{Update transitive closure}
|
243
|
+
\label{transitive-closure}
|
244
|
+
\end{figure}
|
245
|
+
|
246
|
+
|
247
|
+
\section{Query Pattern Translation}
|
248
|
+
\label{query-translation}
|
249
|
+
|
250
|
+
Class structure of the Samizdat RDF storage module is as follows.
|
251
|
+
External API is provided by the {\tt RDF} class. RDF storage
|
252
|
+
configuration as described in section~\ref{relational-data} is
|
253
|
+
encapsulated in {\tt RDFConfig} class. The concrete syntax of
|
254
|
+
Squish~\cite{samizdat-rel-rdf,squish} and SQL is abstracted into {\tt
|
255
|
+
SquishQuery} and its subclasses. The query pattern translation algorithm
|
256
|
+
is implemented by the {\tt SqlMapper} class.
|
257
|
+
|
258
|
+
% prerequisites
|
259
|
+
|
260
|
+
The input to the algorithm is as follows:
|
261
|
+
|
262
|
+
\begin{itemize}
|
263
|
+
|
264
|
+
\item mappings $M = \langle M_{rel}, M_{attr}, M_{sub}, M_{trans}
|
265
|
+
\rangle$ where $M_{rel}: P \to R$, $M_{attr}: P \to \Phi$, $M_{sub}: P
|
266
|
+
\to S$, $M_{trans} \to T$; $P$ is a set of mapped RDF properties, $R$ is
|
267
|
+
a set of relations, $\Phi$ is a set of relation attributes, $S \subset
|
268
|
+
P$ is a subset of RDF properties that have configured subproperties, $T
|
269
|
+
\subset R$ is a set of transitive closures (as described in
|
270
|
+
sections~\ref{relational-data} and \ref{inference-triggers});
|
271
|
+
|
272
|
+
\item graph pattern $\Psi = \langle \Psi_{nodes}, \Psi_{arcs} \rangle =
|
273
|
+
\Pi \cup N \cup \Omega$, where $\Pi$, $N$, and $\Omega$ are main ("must
|
274
|
+
bind"), negative ("must not bind"), and optional ("may bind") graph
|
275
|
+
patterns respectively, such that $\Pi$, $N$, and $\Omega$ share no arcs,
|
276
|
+
and $\Pi$, $\Pi \cup N$ and $\Pi \cup \Omega$ are joint
|
277
|
+
graphs.\footnote{Arcs with the same subject, object, and predicate but
|
278
|
+
different bind mode are treated as distinct.}
|
279
|
+
|
280
|
+
\item global filter condition $F_g \in F$ and local filter conditions
|
281
|
+
$F_c: \Psi_{arcs} \to F$ where $F$ is a set of all literal conditions
|
282
|
+
expressible in the query language syntax.
|
283
|
+
|
284
|
+
\end{itemize}
|
285
|
+
|
286
|
+
For example, consider the following Squish query and its graph pattern
|
287
|
+
$\Psi$ presented in \figurename~\ref{graph-pattern}.
|
288
|
+
|
289
|
+
\begin{Verbatim}[fontsize=\scriptsize]
|
290
|
+
SELECT ?msg
|
291
|
+
WHERE (rdf::predicate ?stmt dc::relation)
|
292
|
+
(rdf::subject ?stmt ?msg)
|
293
|
+
(rdf::object ?stmt ?tag)
|
294
|
+
(dc::date ?stmt ?date)
|
295
|
+
(s::rating ?stmt ?rating
|
296
|
+
FILTER ?rating >= :threshold)
|
297
|
+
EXCEPT (dct::isPartOf ?msg ?parent)
|
298
|
+
OPTIONAL (dc::language ?msg ?original_lang)
|
299
|
+
(s::isTranslationOf ?msg ?translation)
|
300
|
+
(dc::language ?translation ?translation_lang)
|
301
|
+
LITERAL ?original_lang = :lang
|
302
|
+
OR ?translation_lang = :lang
|
303
|
+
GROUP BY ?msg
|
304
|
+
ORDER BY max(?date) DESC
|
305
|
+
\end{Verbatim}
|
306
|
+
|
307
|
+
\begin{figure}
|
308
|
+
|
309
|
+
\centering
|
310
|
+
\psset{unit=3.8mm,labelsep=0.2pt}
|
311
|
+
\begin{pspicture}[showgrid=false](0,0)(23,12)
|
312
|
+
\footnotesize
|
313
|
+
|
314
|
+
\rput(2.5,5.5){\ovalnode{msg}{\sl ?msg}}
|
315
|
+
\rput(10,8){\ovalnode{stmt}{\sl ?stmt}}
|
316
|
+
\rput(2.5,8){\ovalnode{rel}{\it dc:relation}}
|
317
|
+
\rput(5,10.5){\ovalnode{tag}{\sl ?tag}}
|
318
|
+
\rput(14,10.5){\ovalnode{date}{\sl ?date}}
|
319
|
+
\rput(17,8){\ovalnode{rating}{\sl ?rating}}
|
320
|
+
\rput(14,5.5){\ovalnode{parent}{\sl ?parent}}
|
321
|
+
\rput(8,1){\ovalnode{origlang}{\sl ?original\_lang}}
|
322
|
+
\rput(11.2,3.3){\ovalnode{trans}{\sl ?translation}}
|
323
|
+
\rput(19.2,1){\ovalnode{translang}{\sl ?translation\_lang}}
|
324
|
+
|
325
|
+
\ncline{<-}{msg}{stmt} \aput{:U}(0.4){\it rdf:subject}
|
326
|
+
\ncline{<-}{rel}{stmt} \aput{:U}{\it rdf:predicate}
|
327
|
+
\ncline{<-}{tag}{stmt} \aput{:U}{\it rdf:object}
|
328
|
+
\ncline{->}{stmt}{date} \aput{:U}{\it dc:date}
|
329
|
+
\ncline{->}{stmt}{rating} \aput{:U}{\it s:rating}
|
330
|
+
\ncline{->}{msg}{parent} \aput{:U}(0.6){\it dct:isPartOf}
|
331
|
+
\ncline{->}{msg}{origlang} \aput{:U}(0.6){\it dc:language}
|
332
|
+
\ncline{<-}{msg}{trans} \aput{:U}(0.65){\it s:isTranslationOf}
|
333
|
+
\ncline{->}{trans}{translang} \aput{:U}(0.6){\it dc:language}
|
334
|
+
|
335
|
+
\psccurve[curvature=0.75 0.1 0,linestyle=dashed,showpoints=false]%
|
336
|
+
(0.3,5)(0.3,10)(3,11.3)(20,9.5)(20,7)(8.5,7)(2.5,4.5)
|
337
|
+
\rput(18.8,10){$\Pi$}
|
338
|
+
\rput(16.5,5.5){$N$}
|
339
|
+
\rput(12.5,1.5){$\Omega$}
|
340
|
+
|
341
|
+
\end{pspicture}
|
342
|
+
|
343
|
+
\caption{Graph pattern $\Psi$ for the example query}
|
344
|
+
\label{graph-pattern}
|
345
|
+
\end{figure}
|
346
|
+
|
347
|
+
The output of the algorithm is a join expression $F$ and condition $W$
|
348
|
+
ready for composition into {\tt FROM} and {\tt WHERE} clauses of an SQL
|
349
|
+
{\tt SELECT} statement.
|
350
|
+
|
351
|
+
In the algorithm description below, $\mathrm{id}(r)$ is used to denote
|
352
|
+
primary key of relation $r \in R$, and $\rho(n)$ is used to denote value
|
353
|
+
of $\mathrm{id}(Resource)$ for non-variable node $n \in \Psi_{nodes}$
|
354
|
+
where such value is known during query translation.\footnote{E.g.
|
355
|
+
Samizdat uses {\em site-ns/resource-id} notation for internal resource
|
356
|
+
URIrefs.}
|
357
|
+
|
358
|
+
% the algorithm
|
359
|
+
|
360
|
+
Key steps of the query pattern translation algorithm correspond to the
|
361
|
+
following private methods of {\tt SqlMapper}:
|
362
|
+
|
363
|
+
{\tt label\_pattern\_components}: Label every connected component of
|
364
|
+
$\Pi$, $N$, and $\Omega$ with different colors $K$ such that $K_\Pi:
|
365
|
+
\Pi_{nodes} \to \mathbb{K}, K_N: N_{nodes} \to \mathbb{K}, K_\Omega:
|
366
|
+
\Omega_{nodes} \to \mathbb{K}, K(n) = K_\Pi(n) \cup K_N(n) \cup
|
367
|
+
K_\Omega(n)$. The Two-pass Connected Component Labeling
|
368
|
+
algorithm~\cite{shapiro} is used with a special case to exclude nodes
|
369
|
+
present in $\Pi$ from neighbour lists while labeling $N$ and $\Omega$.
|
370
|
+
The special case ensures that parts of $N$ and $\Omega$ which are only
|
371
|
+
connected through a node in $\Pi$ are labeled with different colors.
|
372
|
+
|
373
|
+
{\tt map\_predicates}: Map each arc $c = \langle s, p, o \rangle \in
|
374
|
+
\Psi_{arcs}$ to the relational data model according to $M$: define
|
375
|
+
mapping $M_{attr}^{pos}: \Psi_{arcs} \times \Psi_{nodes} \to \Phi$ such
|
376
|
+
that $M_{attr}^{pos}(c, s) = \mathrm{id}( M_{rel}(p) ),
|
377
|
+
M_{attr}^{pos}(c, o) = M_{attr}(p)$; replace each unmapped arc with its
|
378
|
+
reification and map the resulting arcs in the same manner;\footnote{$M$
|
379
|
+
is expected to map reification properties to the triples table.} for
|
380
|
+
each arc labeled with a subproperty predicate, add an arc mapped to the
|
381
|
+
subproperty qualifier attribute. For each node $n \in \Psi_{nodes}$,
|
382
|
+
find adjacent arcs $\Psi_{nodes}^n = \{\langle s, p, o \rangle \mid n
|
383
|
+
\in \{s, o\}\}$ and determine its binding mode $\beta_{node}:
|
384
|
+
\Psi_{nodes} \to \{ \Pi, N, \Omega \}$ such that $\beta_{node}(n) =
|
385
|
+
max(\beta_{arc}(c) \, \forall c \in \Psi_{nodes}^n)$ where
|
386
|
+
$\beta_{arc}(c)$ reflects which of the graph patterns $\{ \Pi, N, \Omega
|
387
|
+
\}$ contains arc $c$, and the order of precedence used by $max$ is $\Pi
|
388
|
+
> N > \Omega$.
|
389
|
+
|
390
|
+
{\tt define\_relation\_aliases}: Map each node in $\Psi$ to one or more
|
391
|
+
relation aliases $a \in \mathbb{A}$ according to the algorithm described
|
392
|
+
in \figurename~\ref{define-relation-aliases}. The algorithm produces
|
393
|
+
mapping $C_a: \Psi_{arcs} \to \mathbb{A}$ which links every arc in
|
394
|
+
$\Psi$ to an alias, and mappings $A = \langle A_{rel}, A_{node},
|
395
|
+
A_\beta, A_{filter} \rangle$ where $A_{rel}: \mathbb{A} \to R$,
|
396
|
+
$A_{node}: \mathbb{A} \to \Psi_{nodes}$, $A_\beta: \mathbb{A} \to \{
|
397
|
+
\Pi, N, \Omega \}$, $A_{filter}: \mathbb{A} \to F)$ which record
|
398
|
+
relation, node, bind mode, and a filter condition for each alias.
|
399
|
+
|
400
|
+
\begin{figure}
|
401
|
+
\begin{algorithmic}[1]
|
402
|
+
|
403
|
+
\ForAll {$n \in \Psi_{nodes}$}
|
404
|
+
\ForAll {$c = \langle s, p, o \rangle \in \Psi_{arcs} \mid s = n \, \wedge \, C_a(c) = \emptyset$}
|
405
|
+
\If {$\exists c' = \langle s', p', o' \rangle \mid
|
406
|
+
n \in \{s', o'\} \, \wedge \,
|
407
|
+
C_a(c') \not= \emptyset \, \wedge \,
|
408
|
+
M_{rel}(p') = M_{rel}(p)$}
|
409
|
+
|
410
|
+
\State $C_a(c) \gets C_a(c')$
|
411
|
+
\Comment Reuse the alias assigned to an arc adjacent to $n$ and
|
412
|
+
mapped to the same relation
|
413
|
+
|
414
|
+
\Else
|
415
|
+
\Comment Create new alias
|
416
|
+
\State $a = max(\mathbb{A}) + 1$;
|
417
|
+
$\mathbb{A} \gets \mathbb{A} \cup \{ a\}$;
|
418
|
+
$C_a(c) \gets a$
|
419
|
+
\State $A_{node}(a) \gets n$, $A_{filter}(a) \gets \emptyset$
|
420
|
+
|
421
|
+
\If {$M_{trans}(p) = \emptyset$}
|
422
|
+
\Comment Use base relation
|
423
|
+
\State $A_{rel}(a) \gets M_{rel}(p)$
|
424
|
+
\State $A_\beta(a) \gets \beta_{node}(n)$
|
425
|
+
|
426
|
+
\Else
|
427
|
+
\Comment Use transitive closure
|
428
|
+
\State $A_{rel}(a) \gets M_{trans}(p)$
|
429
|
+
\State $A_\beta(a) \gets \beta_{arc}(c)$
|
430
|
+
\State \Comment Use arc's bind mode instead of node's
|
431
|
+
\EndIf
|
432
|
+
\EndIf
|
433
|
+
\EndFor
|
434
|
+
\EndFor
|
435
|
+
|
436
|
+
\ForAll {$c \in \Psi_{arcs}$}
|
437
|
+
\State $A_{filter}( C_a(c) ) \gets A_{filter}( C_a(c) ) \cup F_c(c)$
|
438
|
+
\State \Comment Add arc filter to the linked alias filters
|
439
|
+
\EndFor
|
440
|
+
|
441
|
+
\end{algorithmic}
|
442
|
+
\caption{Define relation aliases}
|
443
|
+
\label{define-relation-aliases}
|
444
|
+
\end{figure}
|
445
|
+
|
446
|
+
{\tt transform}: Define bindings $B: \Psi_{nodes} \to \mathbb{B}$ where
|
447
|
+
$\mathbb{B} = \{\{ \langle a, f \rangle \mid a \in \mathbb{A}, f \in
|
448
|
+
\Phi \}\}$ of graph pattern nodes to sets of pairs of relation aliases
|
449
|
+
and attributes, such that
|
450
|
+
|
451
|
+
\begin{equation}
|
452
|
+
\begin{split}
|
453
|
+
\langle a, f \rangle \in B(n) \iff
|
454
|
+
&\exists c \in \Psi_{arcs}^n \\
|
455
|
+
&C_a(c) = a, M_{attr}^{pos}(c, n) = f \, .
|
456
|
+
\end{split}
|
457
|
+
\end{equation}
|
458
|
+
|
459
|
+
Transform graph pattern $\Psi$ into relational query graph $Q = \langle
|
460
|
+
\mathbb{A}, J \rangle$ where nodes $\mathbb{A}$ are relation aliases
|
461
|
+
defined earlier and edges $J = \{ \langle b_1, b_2, n \rangle \mid b_1 =
|
462
|
+
\langle a_1, f_1 \rangle \in B(n), b_2 = \langle a_2, f_2 \rangle \in
|
463
|
+
B(n), a_1 \not= a_2 \}$ are join conditions. Ground non-variable nodes
|
464
|
+
according to the algorithm defined in
|
465
|
+
\figurename~\ref{ground-non-variable-nodes}. Record list of grounded nodes $G
|
466
|
+
\subseteq \Psi_{nodes}$ such that
|
467
|
+
|
468
|
+
\begin{equation}
|
469
|
+
\begin{split}
|
470
|
+
n \in G \iff &n \in F_g
|
471
|
+
\,\vee\, \exists \langle b_1, b_2, n \rangle \in J \\
|
472
|
+
&\vee\, \exists b \in B(n) \, \exists a \in \mathbb{A} \:
|
473
|
+
b \in A_{filter}(a) \, .
|
474
|
+
\end{split}
|
475
|
+
\end{equation}
|
476
|
+
|
477
|
+
\begin{figure}
|
478
|
+
\begin{algorithmic}[1]
|
479
|
+
|
480
|
+
\State $\exists b = \langle a, f \rangle \in B(n)$
|
481
|
+
\Comment Take any binding of $n$
|
482
|
+
\If {$n$ is an internal resource and $\rho(n) = i$}
|
483
|
+
\State $A_{filter}(a) \gets A_{filter}(a) \cup (b = i)$
|
484
|
+
\ElsIf {$n$ is a query parameter or a literal}
|
485
|
+
\State $A_{filter}(a) \gets A_{filter}(a) \cup (b = n)$
|
486
|
+
\ElsIf {$n$ is a URIref}
|
487
|
+
\Comment Add a join to a URIref tuple in Resource relation
|
488
|
+
\State $\mathbb{A} \gets \mathbb{A} \cup \{ a_r \}$;
|
489
|
+
$A_{node}(a_r) = n$;
|
490
|
+
$A_{rel}(a_r) = Resource$;
|
491
|
+
$A_\beta(a_r) = \beta_{node}(n)$
|
492
|
+
\State $B(n) \gets B(n) \cup \langle a_r, \mathrm{id}(Resource) \rangle;
|
493
|
+
J \gets J \cup
|
494
|
+
\{ \langle b, \langle a_r, \mathrm{id}(Resource) \rangle, n \rangle \}$
|
495
|
+
\State $A_{filter}(a_r) = A_{filter}(a_r) \cup (
|
496
|
+
\langle a_r, literal \rangle = f \wedge
|
497
|
+
\langle a_r, uriref \rangle = t \wedge
|
498
|
+
\langle a_r, label \rangle = n )$
|
499
|
+
\EndIf
|
500
|
+
|
501
|
+
\end{algorithmic}
|
502
|
+
\caption{Ground non-variable nodes}
|
503
|
+
\label{ground-non-variable-nodes}
|
504
|
+
\end{figure}
|
505
|
+
|
506
|
+
Transformation of the example query presented above will result in a
|
507
|
+
relational query graph in \figurename~\ref{join-graph}.
|
508
|
+
|
509
|
+
\begin{figure}
|
510
|
+
|
511
|
+
\centering
|
512
|
+
\psset{unit=3.8mm,labelsep=0.2pt}
|
513
|
+
\begin{pspicture}[showgrid=false](0,0)(23,13)
|
514
|
+
\footnotesize
|
515
|
+
|
516
|
+
\rput(1,6){\circlenode{b}{\vphantom{Ij}b}}
|
517
|
+
\rput(6.7,6){\circlenode{a}{\vphantom{Ij}a}}
|
518
|
+
\rput(12.8,6){\circlenode{c}{\vphantom{Ij}c}}
|
519
|
+
\rput(2,11){\circlenode{d}{\vphantom{Ij}d}}
|
520
|
+
\rput(1,1){\circlenode{g}{\vphantom{Ij}g}}
|
521
|
+
\rput(22,11){\circlenode{f}{\vphantom{Ij}f}}
|
522
|
+
\rput(20,1){\circlenode{e}{\vphantom{Ij}e}}
|
523
|
+
|
524
|
+
\ncline{-}{b}{a} \aput{:U}(0.4){a.id = b.id} \bput{:U}(0.35){?stmt}
|
525
|
+
\ncline{-}{a}{c} \aput{:U}{a.subject = c.id} \bput{:U}{?msg}
|
526
|
+
\ncline{-}{d}{a} \aput{:U}{a.subject = d.id} \bput{:U}(0.4){?msg}
|
527
|
+
\ncline{-}{g}{a} \aput{:U}(0.43){a.predicate = g.id} \bput{:U}{\it dc:relation}
|
528
|
+
\ncline{-}{c}{f} \aput{:U}{c.part\_of\_subproperty = f.id} \bput{:U}{\it s:isTranslationOf}
|
529
|
+
\ncline{-}{c}{e} \aput{:U}{c.part\_of = e.id} \bput{:U}{?translation}
|
530
|
+
|
531
|
+
\pspolygon[linestyle=dashed,linearc=0.8](0.1,0.1)(0.1,11.9)(14.5,11.9)(14.5,0.1)
|
532
|
+
\rput(13.8,1){$P_1$}
|
533
|
+
|
534
|
+
\end{pspicture}
|
535
|
+
|
536
|
+
\caption{Relational query graph $Q$ for the example query}
|
537
|
+
\label{join-graph}
|
538
|
+
\end{figure}
|
539
|
+
|
540
|
+
{\tt generate\_tables\_and\_conditions}: Produce ordered connected
|
541
|
+
minimum edge-disjoint tree cover $P$ for relational query graph $Q$ such
|
542
|
+
that $\forall P_i \in P$ \, $\forall j = \langle b_{j1}, b_{j2}, n_j
|
543
|
+
\rangle \in P_i$ \, $\forall k = \langle b_{k1}, b_{k2}, n_k \rangle \in
|
544
|
+
P_i$:
|
545
|
+
|
546
|
+
\begin{gather}
|
547
|
+
K(n_j) \cap K(n_k) \not= \emptyset \, , \\
|
548
|
+
\beta_{node}(n_j) = \beta_{node}(n_k) = \beta_{tree}(P_i) \, ,
|
549
|
+
\end{gather}
|
550
|
+
|
551
|
+
starting with $P_1$ such that $\beta_{tree}(P_1) = \Pi$ (it follows from
|
552
|
+
definitions of $\Psi$ and {\tt transform} that $P_1$ is the only such
|
553
|
+
tree and covers all join conditions $\langle b_1, b_2, n \rangle \in J$
|
554
|
+
such that $\beta_{node}(n) = \Pi$). Encode $P_1$ as the root inner join.
|
555
|
+
Encode other trees with at least one edge as subqueries. Left join
|
556
|
+
subqueries and aliases representing roots of zero-length trees into join
|
557
|
+
expression $F$. For each $P_i$ such that $\beta_{tree}(P_i) = N$, find a
|
558
|
+
binding $b = \langle a, f \rangle \in P_i$ such that $a \in P_1 \cap
|
559
|
+
P_i$ and add ($b$ {\tt IS NULL}) condition to $W$. For each non-grounded
|
560
|
+
node $n \not\in G$ such that $\langle a, f \rangle \in B(n) \, \wedge \,
|
561
|
+
a \in P_1$, add ($b$ {\tt IS NOT NULL}) condition to $W$ if
|
562
|
+
$\beta_{node}(n) = \Pi$, or ($b$ {\tt IS NULL}) condition if
|
563
|
+
$\beta_{node}(n) = N$. Add $F_g$ to $W$.
|
564
|
+
|
565
|
+
Translation of the example query presented earlier will result in the
|
566
|
+
following SQL:
|
567
|
+
|
568
|
+
\begin{Verbatim}[fontsize=\scriptsize]
|
569
|
+
SELECT DISTINCT a.subject, max(b.published_date)
|
570
|
+
FROM Statement AS a
|
571
|
+
INNER JOIN Resource AS b ON (a.id = b.id)
|
572
|
+
INNER JOIN Resource AS c ON (a.subject = c.id)
|
573
|
+
INNER JOIN Message AS d ON (a.subject = d.id)
|
574
|
+
INNER JOIN Resource AS g ON (a.predicate = g.id)
|
575
|
+
AND (g.literal = 'false' AND g.uriref = 'true'
|
576
|
+
AND g.label = 'http://purl.org/dc/elements/1.1/relation')
|
577
|
+
LEFT JOIN (
|
578
|
+
SELECT e.language AS _field_b, c.id AS _field_a
|
579
|
+
FROM Message AS e
|
580
|
+
INNER JOIN Resource AS f ON (f.literal = 'false'
|
581
|
+
AND f.uriref = 'true' AND f.label =
|
582
|
+
'http://www.nongnu.org/samizdat/rdf/schema#isTranslationOf')
|
583
|
+
INNER JOIN Resource AS c ON (c.part_of_subproperty = f.id)
|
584
|
+
AND (c.part_of = e.id)
|
585
|
+
) AS _subquery_a ON (c.id = _subquery_a._field_a)
|
586
|
+
WHERE (b.published_date IS NOT NULL)
|
587
|
+
AND (a.object IS NOT NULL) AND (a.rating IS NOT NULL)
|
588
|
+
AND (c.part_of IS NULL) AND (a.rating >= ?)
|
589
|
+
AND (d.language = ? OR _subquery_a._field_b = ?)
|
590
|
+
GROUP BY a.subject ORDER BY max(b.published_date) DESC
|
591
|
+
\end{Verbatim}
|
592
|
+
|
593
|
+
|
594
|
+
\section{Update Command Execution}
|
595
|
+
\label{update-execution}
|
596
|
+
|
597
|
+
Update command uses the same graph pattern structure as a query, and
|
598
|
+
additionally defines a set $\Delta \subset \Psi_{nodes}$ of variables
|
599
|
+
representing new RDF resources and a mapping $U: \Psi_{nodes} \to
|
600
|
+
\mathbb{L}$ of variables to literal values. Execution of an update
|
601
|
+
command starts with query pattern translation using the algorithm
|
602
|
+
described in section~\ref{query-translation}. The variables $\Psi$, $A$,
|
603
|
+
$Q$, etc. produced by pattern translation are used in the subsequent
|
604
|
+
stages as described below:
|
605
|
+
|
606
|
+
\begin{enumerate}
|
607
|
+
|
608
|
+
% node values
|
609
|
+
|
610
|
+
\item Construct node values mapping $V: \Psi_{nodes} \to \mathbb{L}$
|
611
|
+
using the algorithm defined in \figurename~\ref{node-values}. Record
|
612
|
+
resources inserted into the database during this stage in $\Delta_{new}
|
613
|
+
\subset \Psi_{nodes}$ (it follows from the algorithm definition that
|
614
|
+
$\Delta \subseteq \Delta_{new}$).
|
615
|
+
|
616
|
+
\begin{figure}
|
617
|
+
\begin{algorithmic}[1]
|
618
|
+
|
619
|
+
\ForAll {$n \in \Psi_{nodes}$}
|
620
|
+
\If {$n$ is an internal resource and $\rho(n) = i$}
|
621
|
+
\State $V(n) \gets i$
|
622
|
+
\ElsIf {$n$ is a query parameter or a literal}
|
623
|
+
\State $V(n) \gets n$
|
624
|
+
\ElsIf {$n$ is a variable}
|
625
|
+
\If {$\nexists c = \langle n, p, o \rangle \in \Psi_{arcs}$}
|
626
|
+
\State \Comment If found only in object position
|
627
|
+
\State $V(n) \gets U(n)$
|
628
|
+
\Else
|
629
|
+
\If {$n \not\in \Delta$}
|
630
|
+
\State $V(n) \gets \mathrm{SquishSelect}(n, \Psi^{n*})$
|
631
|
+
\EndIf
|
632
|
+
\If {$V(n) = \emptyset$}
|
633
|
+
\State Insert $n$ into $Resource$ relation
|
634
|
+
\State $V(n) \gets \rho(n)$
|
635
|
+
\State $\Delta_{new} \gets \Delta_{new} \cup n$
|
636
|
+
\EndIf
|
637
|
+
\EndIf
|
638
|
+
\ElsIf {$n$ is a URIref}
|
639
|
+
\State Select $n$ from $Resource$ relation, insert if missing
|
640
|
+
\State $V(n) \gets \rho(n)$
|
641
|
+
\EndIf
|
642
|
+
\EndFor
|
643
|
+
|
644
|
+
\end{algorithmic}
|
645
|
+
\caption{Determine node values. $\Psi^{n*}$ is a subgraph of $\Psi$
|
646
|
+
reachable from $n$. $\mathrm{SquishSelect}(n, \Psi)$ finds a mapping of
|
647
|
+
variable $n$ that satisfies pattern $\Psi$.}
|
648
|
+
\label{node-values}
|
649
|
+
\end{figure}
|
650
|
+
|
651
|
+
% data assignment
|
652
|
+
|
653
|
+
\item For each alias $a \in \mathbb{A}$, find a subset of graph pattern
|
654
|
+
$\Psi_{arcs}^a \subseteq \Psi_{arcs}$ such that $c \in \Psi_{arcs}^a
|
655
|
+
\iff C_a(c) = a$, select a key node $k$ such that $\exists c = \langle
|
656
|
+
k, p, o \rangle \in \Psi_{arcs}^a$, and collect a map $D_a: \Phi \to
|
657
|
+
\mathbb{L}$ of fields to values such that $\forall c = \langle s, p, o
|
658
|
+
\rangle \in \Psi_{arcs}^a \; \exists D_a(o) = V(o)$. If $k \in
|
659
|
+
\Delta_{new}$ and $A_{rel}(a) \not= Resource$, transform $D_a$ into an
|
660
|
+
SQL {\tt INSERT} into $A_{rel}(a)$ with explicit primary key assignment
|
661
|
+
$\mathrm{id}_k(A_{rel}(a)) \gets V(k)$. Otherwise, transform $D_a$
|
662
|
+
into an {\tt UPDATE} statement on the tuple in $A_{rel}(a)$ for which
|
663
|
+
$\mathrm{id}_k(A_{rel}(a)) = V(k)$.
|
664
|
+
|
665
|
+
% iterative assertions
|
666
|
+
|
667
|
+
\item Execute the SQL statements produced in the previous stage inside
|
668
|
+
the same transaction in the order that resolves their mutual references.
|
669
|
+
|
670
|
+
\end{enumerate}
|
671
|
+
|
672
|
+
|
673
|
+
\section{Implementation}
|
674
|
+
|
675
|
+
The algorithms described in previous sections are implemented by the
|
676
|
+
Samizdat RDF storage module, which is used as the primary means of data
|
677
|
+
access in the Samizdat open publishing system. The module is written in
|
678
|
+
Ruby programming language, supported by several triggers written in
|
679
|
+
procedural SQL. The module and the whole Samizdat engine are available
|
680
|
+
under GNU General Public License.
|
681
|
+
|
682
|
+
Samizdat exposes all RDF resources underpinning the structure and
|
683
|
+
content of the site. HTTP request with a URL of any internal resource
|
684
|
+
yields a page with detailed information about the resource and its
|
685
|
+
relation with other resources. Furthermore, Samizdat provides a
|
686
|
+
graphical interface that allows to compose arbitrary Squish
|
687
|
+
queries.\footnote{Complexity of user queries is limited to a
|
688
|
+
configurable maximum number of triples in the graph pattern to prevent
|
689
|
+
abuse.} Queries may be published so that other users may modify and
|
690
|
+
reuse them, results of a query may be accessed either as plain HTML or
|
691
|
+
as an RSS feed.
|
692
|
+
|
693
|
+
|
694
|
+
\section{Evaluation of Results}
|
695
|
+
\label{evaluation}
|
696
|
+
|
697
|
+
%\enlargethispage{-1ex}
|
698
|
+
|
699
|
+
Samizdat performance was measured using Berlin SPARQL Benchmark
|
700
|
+
(BSBM)~\cite{bsbm}, with following variations: a functional equivalent
|
701
|
+
of BSBM test driver was implemented in Ruby and Squish (instead of Java
|
702
|
+
and SPARQL); the test platform included Intel Core 2 Duo (instead of
|
703
|
+
Quad) clocked at the same frequency, and 2GB of memory (instead of 8GB).
|
704
|
+
In this environment, Samizdat was able to process 25287 complete query
|
705
|
+
mixes per second (QMpH) on a dataset with 1M triples, and achieved 18735
|
706
|
+
QMpH with 25M triples, in both cases exceeding figures for all RDF
|
707
|
+
stores reported in~\cite{bsbm}.
|
708
|
+
|
709
|
+
In production, Samizdat was able to serve without congestion peak loads
|
710
|
+
of up to 5K hits per hour for a site with a dataset sized at 100K
|
711
|
+
triples in a shared VPS environment. Regeneration of the site frontpage
|
712
|
+
on the same dataset executes 997 Squish queries and completes in 7.7s,
|
713
|
+
which is comparable to RDBMS-backed content management systems.
|
714
|
+
|
715
|
+
|
716
|
+
\section{Comparison with Related Work}
|
717
|
+
\label{related-work}
|
718
|
+
|
719
|
+
As mentioned in section~\ref{introduction}, there exists a wide range of
|
720
|
+
solutions for relational to RDF mapping. Besides Samizdat, the approach
|
721
|
+
based on automatic on-demand translation of RDF queries into SQL is also
|
722
|
+
implemented by Federate~\cite{federate}, D2RQ~\cite{d2rq}, and
|
723
|
+
Virtuoso~\cite{virtuoso}.
|
724
|
+
|
725
|
+
While being one of the first solutions to provide on-demand relational
|
726
|
+
to RDF mapping, Samizdat remains one of the most advanced in terms of
|
727
|
+
query capabilities. Its single largest drawback is lack of compatibility
|
728
|
+
with SPARQL; in the same time, in some regards it exceeds capabilities
|
729
|
+
of other solutions.
|
730
|
+
|
731
|
+
The alternative that is closest to Samizdat in terms of query
|
732
|
+
capabilities is Virtuoso RDF Views: it is the only other
|
733
|
+
relational-to-RDF mapping solution that provides partial RDFS and OWL
|
734
|
+
inference, aggregation, and an update language. Still, there are
|
735
|
+
substantial differences between these two projects. First of all,
|
736
|
+
Samizdat RDF store is a small module (1000 lines of Ruby and 200 lines
|
737
|
+
of SQL) that can be used with a variety of RDBMSes, while Virtuoso RDF
|
738
|
+
Views is tied to its own RDBMS. Virtuoso doesn't support implicit
|
739
|
+
statement reification, although its design is compatible with this
|
740
|
+
feature. Finally, Virtuso relies on SQL unions for queries with
|
741
|
+
unspecified predicates and RDFS and OWL inference. While allowing for
|
742
|
+
greater flexibility than the database triggers described in
|
743
|
+
section~\ref{inference-triggers}, iterative union operation has a
|
744
|
+
considerable impact on query performance.
|
745
|
+
|
746
|
+
|
747
|
+
\section{Future Work}
|
748
|
+
\label{future-work}
|
749
|
+
|
750
|
+
Since the SPARQL Recommendation has been published by W3C~\cite{sparql},
|
751
|
+
SPARQL support has been at the top of the Samizdat RDF store to-do list.
|
752
|
+
SPARQL syntax is considerably more expressive than Squish and will
|
753
|
+
require some effort to implement in Samizdat, but, since design of the
|
754
|
+
implementation separates syntactic layer from the query translation
|
755
|
+
logic, the same algorithms as described in this paper can be used to
|
756
|
+
translate SPARQL patterns to SQL with minimal changes. Most substantial
|
757
|
+
changes are expected to be required for the explicit grouping of
|
758
|
+
optional graph patterns and the associated filter scope
|
759
|
+
issues~\cite{cyganiak}.
|
760
|
+
|
761
|
+
Samizdat RDF store should be made more adaptable to a wider variety of
|
762
|
+
problem domains. Query translation algorithm should be augmented to
|
763
|
+
translate an ambiguously mapped query (including queries with
|
764
|
+
unspecified predicates) to a union of alternative interpretations.
|
765
|
+
Mapping of relational schema should be generalized, including support
|
766
|
+
for multi-part keys and more generic stored procedures for reification
|
767
|
+
and inference. Standard RDB2RDF mapping should be implemented when W3C
|
768
|
+
publishes a specification to that end.
|
769
|
+
|
770
|
+
|
771
|
+
\section{Conclusions}
|
772
|
+
|
773
|
+
The on-demand RDF to relational query translation algorithm described
|
774
|
+
in this paper utilizes existing relational databases to their full
|
775
|
+
potential, including indexing, transactions, and procedural SQL, to
|
776
|
+
provide efficient access to RDF data. Implementation of this algorithm
|
777
|
+
in Samizdat RDF storage module has been tried in production environment
|
778
|
+
and demonstrated how Semantic Web technologies can be introduced into an
|
779
|
+
application serving thousands of users without imposing additional
|
780
|
+
requirements on hardware resources.
|
781
|
+
|
782
|
+
\vspace{1ex}
|
783
|
+
|
784
|
+
|
785
|
+
% ---- Bibliography ----
|
786
|
+
%
|
787
|
+
\begin{thebibliography}{19}
|
788
|
+
|
789
|
+
%\bibitem {expressive-power-of-sparql}
|
790
|
+
%Anglez, R., Gutierrez, C.:
|
791
|
+
%The Expressive Power of SPARQL. In: A. Sheth et al. (Eds.) ISWC 2008.
|
792
|
+
%LNCS, vol. 5318, pp. 82-97. Springer, Heidelberg (2008)\\
|
793
|
+
%\url{http://www.dcc.uchile.cl/~cgutierr/papers/expPowSPARQL.pdf}
|
794
|
+
|
795
|
+
\bibitem {triplify}
|
796
|
+
Auer, S., Dietzold, S. Lehman, J., Hellmann, S., Aumueller, D.:
|
797
|
+
Triplify -- Light-Weight Linked Data Publication from Relational
|
798
|
+
Databases. WWW 2009, Madrid, Spain (2009)\\
|
799
|
+
\url{http://www.informatik.uni-leipzig.de/~auer/publication/triplify.pdf}
|
800
|
+
|
801
|
+
%\bibitem {swad-storage}
|
802
|
+
%Beckett, Dave:
|
803
|
+
%Semantic Web Scalability and Storage: Survey of Free Software / Open
|
804
|
+
%Source RDF storage systems. SWAD-Europe Deliverable 10.1 (2001)\\
|
805
|
+
%\url{http://www.w3.org/2001/sw/Europe/reports/rdf\_scalable\_storage\_report}
|
806
|
+
|
807
|
+
%\bibitem {swad-rdbms-mapping}
|
808
|
+
%Beckett, D., Grant, J.:
|
809
|
+
%Semantic Web Scalability and Storage: Mapping Semantic Web Data with
|
810
|
+
%RDBMSes, SWAD-Europe Deliverable 10.2 (2001)\\
|
811
|
+
%\url{http://www.w3.org/2001/sw/Europe/reports/scalable\_rdbms\_mapping\_report}
|
812
|
+
|
813
|
+
%\bibitem {cwm}
|
814
|
+
%Berners-Lee, T., Kolovski, V., Connolly, D., Hendler, J. Scharf, Y.:
|
815
|
+
%A Reasoner for the Web. Theory and Practice of Logic Programming (TPLP),
|
816
|
+
%special issue on Logic Programming and the Web (2000)\\
|
817
|
+
%\url{http://www.w3.org/2000/10/swap/doc/paper/}
|
818
|
+
|
819
|
+
\bibitem {bsbm}
|
820
|
+
Bizer, C., Schultz, A.:
|
821
|
+
The Berlin SPARQL Benchmark. International Journal On Semantic Web and
|
822
|
+
Information Systems (IJSWIS), Volume 5, Issue 2 (2009)\\
|
823
|
+
\url{http://www4.wiwiss.fu-berlin.de/bizer/BerlinSPARQLBenchmark/}
|
824
|
+
|
825
|
+
\bibitem {d2rq}
|
826
|
+
Bizer, C., Seaborne, A.:
|
827
|
+
D2RQ - Treating non-RDF databases as virtual RDF graphs. In: ISWC 2004
|
828
|
+
(posters)\\
|
829
|
+
\url{http://www.wiwiss.fu-berlin.de/bizer/D2RQ/spec/}
|
830
|
+
|
831
|
+
%\bibitem {samizdat-euruko}
|
832
|
+
%Borodaenko, Dmitry:
|
833
|
+
%RDF storage for Ruby: the case of Samizdat. EuRuKo 2003, Karlsruhe (June
|
834
|
+
%2003)\\
|
835
|
+
%\url{http://samizdat.nongnu.org/slides/euruko2003\_samizdat.html}
|
836
|
+
|
837
|
+
%\bibitem {samizdat-impl-report}
|
838
|
+
%Borodaenko, Dmitry:
|
839
|
+
%Samizdat RDF Implementation Report. RDF Interest ML (September 2003)\\
|
840
|
+
%\url{http://lists.w3.org/Archives/Public/www-rdf-interest/2003Sep/0043.html}
|
841
|
+
|
842
|
+
\bibitem {samizdat-rel-rdf}
|
843
|
+
Borodaenko, Dmitry:
|
844
|
+
Accessing Relational Data with RDF Queries and Assertions (April 2004)\\
|
845
|
+
\url{http://samizdat.nongnu.org/papers/rel-rdf.pdf}
|
846
|
+
|
847
|
+
\bibitem {samizdat-collreif}
|
848
|
+
Borodaenko, Dmitry:
|
849
|
+
Model for Collaborative Decision Making Based on RDF Reification (April
|
850
|
+
2004)\\
|
851
|
+
\url{http://samizdat.nongnu.org/papers/collreif.pdf}
|
852
|
+
|
853
|
+
\bibitem {cyganiak}
|
854
|
+
Cyganiak, R.:
|
855
|
+
A relational algebra for SPARQL. Technical Report HPL-2005-170, HP Labs
|
856
|
+
(2005)\\
|
857
|
+
\url{http://www.hpl.hp.com/techreports/2005/HPL-2005-170.html}
|
858
|
+
|
859
|
+
\bibitem {virtuoso}
|
860
|
+
Erling, O., Mikhailov I.:
|
861
|
+
RDF support in the Virtuoso DBMS. In: Proceedings of the 1st Conference
|
862
|
+
on Social Semantic Web, volume P-113 of GI-Edition -- Lecture Notes in
|
863
|
+
Informatics (LNI), ISSN 1617-5468. Bonner K\"{o}llen Verlag (2007)\\
|
864
|
+
\url{http://virtuoso.openlinksw.com/dav/wiki/Main/VOSArticleRDF}
|
865
|
+
|
866
|
+
%\bibitem {rdf-mt}
|
867
|
+
%Hayes, Patrick:
|
868
|
+
%RDF Semantics. W3C Recommendation (February 2004)\\
|
869
|
+
%\url{http://www.w3.org/TR/rdf-mt/}
|
870
|
+
|
871
|
+
%\bibitem {rdf-syntax-1999}
|
872
|
+
%Lassila, O., Swick, R.~R.:
|
873
|
+
%Resource Description Framework (RDF) Model and Syntax Specification, W3C
|
874
|
+
%Recommendation (February 1999)\\
|
875
|
+
%\url{http://www.w3.org/TR/1999/REC-rdf-syntax-19990222/}
|
876
|
+
|
877
|
+
%\bibitem {rdb2rdf-xg-report}
|
878
|
+
%Malhotra, Ashok:
|
879
|
+
%W3C RDB2RDF Incubator Group Report. W3C Incubator Group Report (January
|
880
|
+
%2009)\\
|
881
|
+
%\url{http://www.w3.org/2005/Incubator/rdb2rdf/XGR-rdb2rdf/}
|
882
|
+
|
883
|
+
%\bibitem {melnik}
|
884
|
+
%Melnik, S.:
|
885
|
+
%Storing RDF in a relational database. Stanford University (2001)\\
|
886
|
+
%\url{http://infolab.stanford.edu/~melnik/rdf/db.html}
|
887
|
+
|
888
|
+
\bibitem {squish}
|
889
|
+
Miller, Libby, Seaborne, Andy, Reggiori, Alberto:
|
890
|
+
Three Implementations of SquishQL, a Simple RDF Query Language. In:
|
891
|
+
Horrocks, I., Hendler, J. (Eds) ISWC 2002. LNCS vol. 2342, pp. 423-435.
|
892
|
+
Springer, Heidelberg (2002)\\
|
893
|
+
\url{http://ilrt.org/discovery/2001/02/squish/}
|
894
|
+
|
895
|
+
%\bibitem {nuutila}
|
896
|
+
%Nuutila, Esko:
|
897
|
+
%Efficient Transitive Closure Computation in Large Digraphs. Acta
|
898
|
+
%Polytechnica Scandinavica, Mathematics and Computing in Engineering
|
899
|
+
%Series No. 74, Helsinki (1995)\\
|
900
|
+
%\url{http://www.cs.hut.fi/~enu/thesis.html}
|
901
|
+
|
902
|
+
%\bibitem {owl-semantics}
|
903
|
+
%Patel-Schneider, Peter F., Hayes, Patrick, Horrocks, Ian:
|
904
|
+
%OWL Web Ontology Language Semantics and Abstract Syntax. W3C
|
905
|
+
%Recommendation (February 2004)\\
|
906
|
+
%\url{http://www.w3.org/TR/owl-semantics/}
|
907
|
+
|
908
|
+
\bibitem {federate}
|
909
|
+
Prud'hommeaux, Eric:
|
910
|
+
RDF Access to Relational Databases (2003)\\
|
911
|
+
\url{http://www.w3.org/2003/01/21-RDF-RDB-access/}
|
912
|
+
|
913
|
+
\bibitem {sparql}
|
914
|
+
Prud'hommeaux, Eric, Seaborne, Andy:
|
915
|
+
SPARQL Query Language for RDF. W3C Recommendation (January 2008)\\
|
916
|
+
\url{http://www.w3.org/TR/rdf-sparql-query/}
|
917
|
+
|
918
|
+
\bibitem {shapiro}
|
919
|
+
Shapiro, L., Stockman, G:
|
920
|
+
Computer Vision, pp. 69-73. Prentice-Hall (2002)\\
|
921
|
+
\url{http://www.cse.msu.edu/~stockman/Book/2002/Chapters/ch3.pdf}
|
922
|
+
|
923
|
+
\bibitem {sp2b}
|
924
|
+
Schmidt, M., Hornung, T., K\"{u}chlin, N., Lausen, G., Pinkel, C.:
|
925
|
+
An Experimental Comparison of RDF Data Management Approaches in a SPARQL
|
926
|
+
Benchmark Scenario. In: A. Sheth et al. (Eds.) ISWC 2008. LNCS vol.
|
927
|
+
5318, pp. 82-97. Springer, Heidelberg (2008)\\
|
928
|
+
\url{http://www.informatik.uni-freiburg.de/~mschmidt/docs/sp2b\_exp.pdf}
|
929
|
+
|
930
|
+
%\bibitem {treehugger}
|
931
|
+
%Steer, D.:
|
932
|
+
%TreeHugger -- XSLT for RDF (2003)\\
|
933
|
+
%\url{http://rdfweb.org/people/damian/treehugger/}
|
934
|
+
|
935
|
+
\end{thebibliography}
|
936
|
+
\end{document}
|