graffiti 2.1
Sign up to get free protection for your applications and to get access to all the features.
- data/COPYING +676 -0
- data/ChangeLog.mtn +233 -0
- data/README.rdoc +129 -0
- data/TODO +30 -0
- data/doc/diagrams/graffiti-classes.svg +157 -0
- data/doc/diagrams/graffiti-deployment.svg +117 -0
- data/doc/diagrams/graffiti-store-sequence.svg +69 -0
- data/doc/diagrams/squish-select-sequence.svg +266 -0
- data/doc/examples/samizdat-rdf-config.yaml +77 -0
- data/doc/examples/samizdat-triggers-pgsql.sql +266 -0
- data/doc/papers/collreif.tex +462 -0
- data/doc/papers/rdf-to-relational-query-translation-icis2009.tex +936 -0
- data/doc/papers/rel-rdf.tex +545 -0
- data/doc/rdf-impl-report.txt +126 -0
- data/graffiti.gemspec +21 -0
- data/lib/graffiti.rb +15 -0
- data/lib/graffiti/debug.rb +34 -0
- data/lib/graffiti/exceptions.rb +20 -0
- data/lib/graffiti/rdf_config.rb +78 -0
- data/lib/graffiti/rdf_property_map.rb +92 -0
- data/lib/graffiti/sql_mapper.rb +916 -0
- data/lib/graffiti/squish.rb +568 -0
- data/lib/graffiti/store.rb +100 -0
- data/setup.rb +1360 -0
- data/test/ts_graffiti.rb +455 -0
- metadata +122 -0
@@ -0,0 +1,545 @@
|
|
1
|
+
\documentclass{llncs}
|
2
|
+
\usepackage{makeidx} % allows for indexgeneration
|
3
|
+
\usepackage{graphicx}
|
4
|
+
\usepackage[pdfpagescrop={92 112 523 778},a4paper=false,
|
5
|
+
pdfborder={0 0 0}]{hyperref}
|
6
|
+
\emergencystretch=8pt
|
7
|
+
%
|
8
|
+
\begin{document}
|
9
|
+
\mainmatter % start of the contributions
|
10
|
+
%
|
11
|
+
\title{Accessing Relational Data with RDF Queries and Assertions}
|
12
|
+
\toctitle{Accessing Relational Data with RDF Queries and Assertions}
|
13
|
+
\titlerunning{Accessing Relational Data with RDF}
|
14
|
+
%
|
15
|
+
\author{Dmitry Borodaenko}
|
16
|
+
\authorrunning{Dmitry Borodaenko} % abbreviated author list (for running head)
|
17
|
+
%%%% modified list of authors for the TOC (add the affiliations)
|
18
|
+
\tocauthor{Dmitry Borodaenko}
|
19
|
+
%
|
20
|
+
\institute{\email{angdraug@debian.org}}
|
21
|
+
|
22
|
+
\maketitle % typeset the title of the contribution
|
23
|
+
|
24
|
+
\begin{abstract}
|
25
|
+
This paper presents a hybrid RDF storage model that combines relational data
|
26
|
+
with arbitrary RDF meta-data, as implemented in the RDF storage layer of the
|
27
|
+
Samizdat open publishing and collaboration engine, and explains the supporting
|
28
|
+
algorithms for online translation of RDF queries and conditional assertions
|
29
|
+
into their relational equivalents. Proposed model allows to supplement legacy
|
30
|
+
databases with RDF meta-data without sacrificing the benefits of RDBMS
|
31
|
+
technology.
|
32
|
+
\end{abstract}
|
33
|
+
|
34
|
+
|
35
|
+
\section{Introduction}
|
36
|
+
|
37
|
+
The survey of free software / open source RDF storage systems performed by
|
38
|
+
SWAD-Europe\cite{swad-storage} has found that the most wide-spread approach to
|
39
|
+
RDF storage relies on relational databases. As seen from the companion report
|
40
|
+
on mapping Semantic Web data with RDBMSes\cite{swad-rdbms-mapping},
|
41
|
+
traditional relational representation of RDF is a triple store, usually
|
42
|
+
evolving around a central statement table with \{subject, predicate, object\}
|
43
|
+
triples as its rows and one or more tables storing resource URIrefs,
|
44
|
+
namespaces, and other supplementary data.
|
45
|
+
|
46
|
+
While such triple store approach serves well to satisfy the open world
|
47
|
+
assumption of RDF, by abandoning existing relational data models it fails to
|
48
|
+
take full advantage of the RDBMS technology. According to \cite{swad-storage},
|
49
|
+
existing RDF storage tools are still immature; in the same time, although
|
50
|
+
modern triple stores claim to scale to millions of triples, ICS-FORTH
|
51
|
+
research\cite{ics-volume} shows that schema-specific storage model yields
|
52
|
+
better results with regards to performance and scalability on large volumes of
|
53
|
+
data.
|
54
|
+
|
55
|
+
These concerns are addressed from different angles by RSSDB\cite{rssdb},
|
56
|
+
Federate\cite{ericp-rdf-rdb-access}, and D2R\cite{d2r} packages. RSSDB splits
|
57
|
+
the single triples table into a schema-specific set of property tables. In
|
58
|
+
this way, it walks away from relational data model, but maintains performance
|
59
|
+
benefits due to better indexing. Federate takes the most conservative approach
|
60
|
+
and allows to query a relational database with a restricted
|
61
|
+
application-specific RDF schema. Conversely, D2R is intended for batch export
|
62
|
+
of data from RDBMS to RDF and assumes that subsequent operation will involve
|
63
|
+
only RDF.
|
64
|
+
|
65
|
+
The hybrid RDF storage model presented in this paper attacks this problem from
|
66
|
+
yet another angle, which can be described as a combination of Federate's
|
67
|
+
relational-to-RDF mapping and a traditional triple store. While having the
|
68
|
+
advantage of being designed from the ground up with the RDF model in mind,
|
69
|
+
Samizdat RDF layer\cite{samizdat-rdf-storage} deviated from the common RDF
|
70
|
+
storage practice in order to use both relational and triple data models and
|
71
|
+
get the best of both worlds. Hybrid storage model was designed, and algorithms
|
72
|
+
were implemented that allow to access the data in the hybrid triple-relational
|
73
|
+
model with RDF queries and conditional assertions in an extended variant of
|
74
|
+
the Squish\cite{squish} query language.\footnote{The decision to use Squish
|
75
|
+
over more expressive languages like RDQL\cite{rdql} and
|
76
|
+
Notation3\cite{notation3} was made due to its intuitive syntax, which was
|
77
|
+
found more suitable for the Samizdat's query composer GUI intended for end
|
78
|
+
users of an open-publishing system.} This paper describes the proposed model
|
79
|
+
and its implementation in the Samizdat engine.
|
80
|
+
|
81
|
+
|
82
|
+
\section{Relational Database Schema}
|
83
|
+
|
84
|
+
All content in a Samizdat site is represented internally as RDF. Canonic
|
85
|
+
URIref for any Samizdat resource is {\tt http://<site-url>/<resource-id>},
|
86
|
+
where {\tt <site-url>} is a base URL of the site and {\tt <resource-id>} is a
|
87
|
+
unique (within a single site) numeric identifier of the resource.
|
88
|
+
|
89
|
+
Root of SQL representation of RDF resources is {\tt Resource} table with {\tt
|
90
|
+
id} primary key field storing {\tt <resource-id>}, and {\tt label} text field
|
91
|
+
representing resource label. Semantics of label values are different for
|
92
|
+
literals, references to external resources, and internal resources of the
|
93
|
+
site.
|
94
|
+
|
95
|
+
\emph{Literal} value (including typed literals) is stored directly in the {\tt
|
96
|
+
label} field and marked with {\tt literal} boolean field.
|
97
|
+
|
98
|
+
\emph{External resource} label contains the resource URIref and is marked with
|
99
|
+
{\tt uriref} boolean field.
|
100
|
+
|
101
|
+
\emph{Internal resource} is mapped into a row in an \emph{internal resource
|
102
|
+
table} with name corresponding to the resource class name stored in the {\tt
|
103
|
+
label} field, primary key {\tt id} field referencing back to the {\tt
|
104
|
+
Resource} table, and other fields holding values of \emph{internal properties}
|
105
|
+
for this resource class, represented as literals or references to other
|
106
|
+
resources stored in the {\tt Resource} table. Primary key reference to {\tt
|
107
|
+
Resource.id} is enforced by PostgreSQL stored procedures.
|
108
|
+
|
109
|
+
To determine what information about a resource can be stored in and extracted
|
110
|
+
from class-specific tables, RDF storage layer consults site-specific mapping
|
111
|
+
\begin{equation}
|
112
|
+
M(p) = \{\langle t_{p1},~f_{p1} \rangle, \enspace \dots\} \enspace ,
|
113
|
+
\end{equation}
|
114
|
+
which stores a list of possible pairs of SQL table name $t$ and field name $f$
|
115
|
+
for each internal property name $p$. Mapping $M$ is read at runtime from
|
116
|
+
external YAML\cite{yaml} file of the following form:
|
117
|
+
|
118
|
+
\begin{verbatim}
|
119
|
+
---
|
120
|
+
ns:
|
121
|
+
s: 'http://www.nongnu.org/samizdat/rdf/schema#'
|
122
|
+
focus: 'http://www.nongnu.org/samizdat/rdf/focus#'
|
123
|
+
items: 'http://www.nongnu.org/samizdat/rdf/items#'
|
124
|
+
rdf: 'http://www.w3.org/1999/02/22-rdf-syntax-ns#'
|
125
|
+
dc: 'http://purl.org/dc/elements/1.1/'
|
126
|
+
|
127
|
+
map:
|
128
|
+
'dc::date': {Resource: published_date}
|
129
|
+
's::id': {Resource: id}
|
130
|
+
|
131
|
+
'rdf::subject': {Statement: subject}
|
132
|
+
'rdf::predicate': {Statement: predicate}
|
133
|
+
'rdf::object': {Statement: object}
|
134
|
+
|
135
|
+
's::rating': {Statement: rating}
|
136
|
+
|
137
|
+
. . .
|
138
|
+
\end{verbatim}
|
139
|
+
|
140
|
+
\emph{External properties}, i.e. properties that are not covered by $M$, are
|
141
|
+
represented by \{{\tt subject}, {\tt predicate}, {\tt object}\} triples in the
|
142
|
+
{\tt Statement} table. Every such triple is treated as a reified statement in
|
143
|
+
RDF semantics and is assigned a {\tt <resource-id>} and a record in the {\tt
|
144
|
+
Resource} table.
|
145
|
+
|
146
|
+
{\tt Resource} and {\tt Statement} are also internal resource tables, and, as
|
147
|
+
such, have some of their fields mapped by $M$. In particular, {\tt subject},
|
148
|
+
{\tt predicate}, and {\tt object} fields of the {\tt Statement} table are
|
149
|
+
mapped to the corresponding properties from the RDF reification vocabulary,
|
150
|
+
and {\tt Resource.id} is mapped to {\tt samizdat:id} property from Samizdat
|
151
|
+
namespace.
|
152
|
+
|
153
|
+
Excerpt from default Samizdat database schema with mapped field names replaced
|
154
|
+
by predicate QNames is visualized on Fig.\,\ref{db-schema-figure}. In addition
|
155
|
+
to {\tt Resource} and {\tt Statement} tables described above, it shows the
|
156
|
+
{\tt Message} table representing one of internal resource classes. Note how
|
157
|
+
{\tt dc:date} property is made available to all resource classes, and how
|
158
|
+
reified statements are allowed to have optional {\tt samizdat:rating}
|
159
|
+
property.
|
160
|
+
|
161
|
+
\begin{figure}
|
162
|
+
%\begin{verbatim}
|
163
|
+
% +-------------+ +-----------------+
|
164
|
+
% | Resource | | Statement |
|
165
|
+
% +-------------+ +-----------------+
|
166
|
+
% +->| samizdat:id |<-+-| id |
|
167
|
+
% | | label | +-| rdf:subject |
|
168
|
+
% | | literal | +-| rdf:predicate |
|
169
|
+
% | | uriref | +-| rdf:object |
|
170
|
+
% | | dc:date | | samizdat:rating |
|
171
|
+
% | +-------------+ +-----------------+
|
172
|
+
% |
|
173
|
+
% | +------------------+
|
174
|
+
% | | Message |
|
175
|
+
% | +------------------+
|
176
|
+
% +--| id |
|
177
|
+
% | dc:title |
|
178
|
+
% | dc:format |
|
179
|
+
% | samizdat:content |
|
180
|
+
% +------------------+
|
181
|
+
%\end{verbatim}
|
182
|
+
\begin{center}
|
183
|
+
\includegraphics[scale=0.6]{fig1.eps}
|
184
|
+
\end{center}
|
185
|
+
\caption{Excerpt from Samizdat database schema}
|
186
|
+
\label{db-schema-figure}
|
187
|
+
\end{figure}
|
188
|
+
|
189
|
+
|
190
|
+
\section{Query Pattern Translation}
|
191
|
+
%
|
192
|
+
\subsection{Prerequisites}
|
193
|
+
|
194
|
+
Pattern translation algorithm operates on the pattern section of a Squish
|
195
|
+
query. Query pattern $\Psi$ is represented as a list of \emph{pattern clauses}
|
196
|
+
\begin{equation}
|
197
|
+
\psi_i = \langle p_i,~s_i,~o_i \rangle \enspace ,
|
198
|
+
\end{equation}
|
199
|
+
where $i$ is the position of a clause, $p_i$ is the predicate URIref, $s_i$ is
|
200
|
+
the subject node and may be URIref or blank node, $o_i$ is the object node and
|
201
|
+
may be URIref, blank node, or literal.
|
202
|
+
|
203
|
+
\subsection{Predicate Mapping}
|
204
|
+
|
205
|
+
For each position $i$, predicate URIref $p_i$ is looked up in the map of
|
206
|
+
internal resource properties $M$. All possible mappings are recorded for all
|
207
|
+
clauses in a list $C$:
|
208
|
+
\begin{equation}
|
209
|
+
c_i = \{\langle t_{i1},~f_{i1} \rangle, \enspace \langle t_{i2},~f_{i2}
|
210
|
+
\rangle, \enspace \dots\} \enspace ,
|
211
|
+
\end{equation}
|
212
|
+
where $t_{ij}$ is the table name (same for subject $s_i$ and object $o_i$) and
|
213
|
+
$f_{ij}$ is the field name (meaningful for object only, since subject is
|
214
|
+
always mapped to the {\tt id} primary key). In the same iteration, all subject
|
215
|
+
and object positions of nodes are recorded in the reverse positional mapping
|
216
|
+
\begin{equation}
|
217
|
+
R(n) = \{\langle i_1,~m_1 \rangle, \enspace \langle i_2,~m_2 \rangle, \enspace
|
218
|
+
\dots\} \enspace ,
|
219
|
+
\end{equation}
|
220
|
+
where $m$ shows whether node $n$ appears as subject or as object in the clause
|
221
|
+
$i$.
|
222
|
+
|
223
|
+
Each ambiguous property mapping is compared with mappings for other
|
224
|
+
occurrences of the same subject and object nodes in the pattern graph; anytime
|
225
|
+
non-empty intersection of mappings for the same node is found, both subject
|
226
|
+
and object mappings for the ambiguous property are refined to such
|
227
|
+
intersection.
|
228
|
+
|
229
|
+
\subsection{Relation Aliases and Join Conditions}
|
230
|
+
|
231
|
+
Relation alias $a_i$ is determined for each clause mapping $c_i$, such that
|
232
|
+
for all subject occurrences of the subject $s_i$ that were mapped to the same
|
233
|
+
table $t_i$, alias is the same, and for all positions with differing table
|
234
|
+
mapping or subject node, alias is different.
|
235
|
+
|
236
|
+
For all nodes $n$ that are mapped to more than one $\langle a_i,~f_i \rangle$
|
237
|
+
pair in different positions, join conditions are generated. Additionally, for
|
238
|
+
each external resource, {\tt Resource} table is joined by URIref, and for each
|
239
|
+
existential blank node that isn't already bound by join, {\tt NOT NULL}
|
240
|
+
condition is generated. Resulting join conditions set $J$ is used to generate
|
241
|
+
the {\tt WHERE} section of the target SQL query.
|
242
|
+
|
243
|
+
\subsection{Example}
|
244
|
+
|
245
|
+
Following Squish query selects all messages with rating above 1:
|
246
|
+
|
247
|
+
\begin{verbatim}
|
248
|
+
SELECT ?msg, ?title, ?name, ?date, ?rating
|
249
|
+
WHERE (dc::title ?msg ?title)
|
250
|
+
(dc::creator ?msg ?author)
|
251
|
+
(s::fullName ?author ?name)
|
252
|
+
(dc::date ?msg ?date)
|
253
|
+
(rdf::subject ?stmt ?msg)
|
254
|
+
(rdf::predicate ?stmt dc::relation)
|
255
|
+
(rdf::object ?stmt focus::Quality)
|
256
|
+
(s::rating ?stmt ?rating)
|
257
|
+
LITERAL ?rating >= 1
|
258
|
+
ORDER BY ?rating
|
259
|
+
USING rdf FOR http://www.w3.org/1999/02/22-rdf-syntax-ns#
|
260
|
+
dc FOR http://purl.org/dc/elements/1.1/
|
261
|
+
s FOR http://www.nongnu.org/samizdat/rdf/schema#
|
262
|
+
focus FOR http://www.nongnu.org/samizdat/rdf/focus#
|
263
|
+
\end{verbatim}
|
264
|
+
|
265
|
+
Mappings produced by translation of this query are summarized in the
|
266
|
+
Table~\ref{mappings-table}.
|
267
|
+
|
268
|
+
\begin{table}
|
269
|
+
\caption{Query Translation Mappings}
|
270
|
+
\label{mappings-table}
|
271
|
+
\begin{center}
|
272
|
+
\begin{tabular}{clll}
|
273
|
+
\hline\noalign{\smallskip}
|
274
|
+
$i$ & $t_i$ & $f_i$ & $a_i$\\
|
275
|
+
\noalign{\smallskip}
|
276
|
+
\hline
|
277
|
+
\noalign{\smallskip}
|
278
|
+
1 & {\tt Message} & {\tt title} & {\tt b}\\
|
279
|
+
2 & {\tt Message} & {\tt creator} & {\tt b}\\
|
280
|
+
3 & {\tt Member} & {\tt full\_name} & {\tt d}\\
|
281
|
+
4 & {\tt Resource} & {\tt published\_date} & {\tt c}\\
|
282
|
+
5 & {\tt Statement} & {\tt subject} & {\tt a}\\
|
283
|
+
6 & {\tt Statement} & {\tt predicate} & {\tt a}\\
|
284
|
+
7 & {\tt Statement} & {\tt object} & {\tt a}\\
|
285
|
+
8 & {\tt Statement} & {\tt rating} & {\tt a}\\
|
286
|
+
\hline
|
287
|
+
\end{tabular}
|
288
|
+
\end{center}
|
289
|
+
\end{table}
|
290
|
+
|
291
|
+
As a result of translation, following SQL query will be generated:
|
292
|
+
|
293
|
+
\begin{verbatim}
|
294
|
+
SELECT b.id, b.title, d.full_name, c.published_date, a.rating
|
295
|
+
FROM Statement a, Message b, Resource c, Member d,
|
296
|
+
Resource e, Resource f
|
297
|
+
WHERE a.id IS NOT NULL
|
298
|
+
AND a.object = e.id AND e.literal = false
|
299
|
+
AND e.uriref = true AND e.label = 'focus::Quality'
|
300
|
+
AND a.predicate = f.id AND f.literal = false
|
301
|
+
AND f.uriref = true AND f.label = 'dc::relation'
|
302
|
+
AND a.rating IS NOT NULL
|
303
|
+
AND b.creator = d.id
|
304
|
+
AND b.id = a.subject
|
305
|
+
AND b.id = c.id
|
306
|
+
AND b.title IS NOT NULL
|
307
|
+
AND c.published_date IS NOT NULL
|
308
|
+
AND d.full_name IS NOT NULL
|
309
|
+
AND (a.rating >= 1)
|
310
|
+
ORDER BY a.rating
|
311
|
+
\end{verbatim}
|
312
|
+
|
313
|
+
\subsection{Limitations}
|
314
|
+
|
315
|
+
In RDF model theory\cite{rdf-mt}, a resource may belong to more than one
|
316
|
+
class. In Samizdat RDF storage model, resource class specified in {\tt
|
317
|
+
Resource.label} is treated as the primary class: it is not possible to have
|
318
|
+
some of the internal properties of a resource mapped to one table and some
|
319
|
+
other internal properties mapped to the other. The only exception to this is,
|
320
|
+
obviously, the {\tt Resource} table, which is shared by all resource classes.
|
321
|
+
|
322
|
+
Predicates with cardinality greater than 1 cannot be mapped to internal
|
323
|
+
resource tables, and should be recorded as reified statements instead.
|
324
|
+
|
325
|
+
RDF properties are allowed to be mapped to more than one internal resource
|
326
|
+
table, and queries on such ambiguous properties are intended to select all
|
327
|
+
classes of resources that match this property in conjunction with the rest of
|
328
|
+
the query.
|
329
|
+
|
330
|
+
The algorithm described above assumes that other pattern clauses refine such
|
331
|
+
ambiguous property mapping to one internal resource table. Queries that fail
|
332
|
+
this assumption will be translated incorrectly by the current implementation:
|
333
|
+
only the resource class from the first remaining mapping will be matched. This
|
334
|
+
should be taken into account in site-specific resource maps: ambiguous
|
335
|
+
properties should be avoided where possible, and their mappings should go in
|
336
|
+
order of resource class probability descension.
|
337
|
+
|
338
|
+
It is possible to solve this problem, but any precise solution will add
|
339
|
+
significant complexity to the resulting query. Solutions that would not
|
340
|
+
adversely affect performance are still being sought. So far, it is recommended
|
341
|
+
not to specify more than one mapping per internal property.
|
342
|
+
|
343
|
+
|
344
|
+
\section{Conditional Assertion}
|
345
|
+
%
|
346
|
+
\subsection{Prerequisites}
|
347
|
+
|
348
|
+
Conditional assertion statement in Samizdat Squish is recorded using the same
|
349
|
+
syntax as RDF query, with the {\tt SELECT} section containing variables list
|
350
|
+
replaced by {\tt INSERT} section with a list of ``don't-bind'' variables and
|
351
|
+
{\tt UPDATE} section containing assignments of values to query variables:
|
352
|
+
|
353
|
+
\begin{verbatim}
|
354
|
+
[ INSERT node [, ...] ]
|
355
|
+
[ UPDATE node = value [, ...] ]
|
356
|
+
WHERE (predicate subject object) [...]
|
357
|
+
[ USING prefix FOR namespace [...] ]
|
358
|
+
\end{verbatim}
|
359
|
+
|
360
|
+
Initially, pattern clauses in assertion are translated using the same
|
361
|
+
procedure as for a query. Pattern $\Psi$, clause mapping $C$, reverse
|
362
|
+
positional mapping $R$, alias list $A$, and join conditions set $J$ are
|
363
|
+
generated as described in the previous section.
|
364
|
+
|
365
|
+
After that, database update is performed in two stages described below. Both
|
366
|
+
stages are executed within a single transaction, rolling back intermediate
|
367
|
+
inserts and updates in case assertion fails.
|
368
|
+
|
369
|
+
\subsection{Resource Values}
|
370
|
+
|
371
|
+
On this stage value mapping $V(n)$ is defined for each node $n$, and necessary
|
372
|
+
resource insertions are performed:
|
373
|
+
|
374
|
+
\begin{enumerate}
|
375
|
+
\item If $n$ is an internal resource, $V(n)$ is its {\tt id}. If there is no
|
376
|
+
resource with such {\tt id} in the database, error is raised.
|
377
|
+
\item If $n$ is a literal, $V(n)$ is the literal value.
|
378
|
+
\item If $n$ is a blank node and only appears in object position, it is
|
379
|
+
assigned a value from the {\tt UPDATE} section of the assertion.
|
380
|
+
\item If $n$ is a blank node and appears in subject position, it is either
|
381
|
+
looked up in the database or inserted as a new resource. If no resource in the
|
382
|
+
database matches $n$ (to check that, subgraph of $\Psi$ including all pattern
|
383
|
+
nodes and predicates reachable from $n$ is generated and matched against the
|
384
|
+
database), or if $n$ appears in the {\tt INSERT} section of the assertion, new
|
385
|
+
resource is created and its {\tt id} is assigned to $V(n)$. If matching
|
386
|
+
resource is found, $V(n)$ becomes equal to its {\tt id}.
|
387
|
+
\item If $n$ is an external URIref, it is looked up in the {\tt Resource}
|
388
|
+
table. As with subject blank nodes, $V(n)$ is the {\tt id} of a matching or
|
389
|
+
new resource.
|
390
|
+
\end{enumerate}
|
391
|
+
|
392
|
+
All nodes that were inserted during this stage are recorded in the set of new
|
393
|
+
nodes $N$.
|
394
|
+
|
395
|
+
\subsection{Data Assignment}
|
396
|
+
|
397
|
+
For all aliases from $A$ except additional aliases that are defined for
|
398
|
+
external URIref nodes (which don't have to be looked up since their {\tt id}s
|
399
|
+
are recorded in $V$ during the previous stage), reverse positional mapping
|
400
|
+
\begin{equation}
|
401
|
+
R_\mathrm{A}(a) = \{i_1, \enspace i_2, \enspace \dots\}
|
402
|
+
\end{equation}
|
403
|
+
is defined. Key node $K$ is defined as the subject node $s_{i_1}$ from clause
|
404
|
+
$\psi_{i_1}$, and aliased table $t$ is defined as the table name $t_{i_1}$
|
405
|
+
from clause mapping $c_{i_1}$.
|
406
|
+
|
407
|
+
For each position $k$ from $R_\mathrm{A}(a)$, a pair $\langle f_k, V(o_k)
|
408
|
+
\rangle$, where $f_k$ is the field name from $c_k$, and $o_k$ the object node
|
409
|
+
from $\psi_k$, is added to the data assignment list $D(K)$ if node $o_k$
|
410
|
+
occurs in new node list $N$ or in {\tt UPDATE} section of the assertion
|
411
|
+
statement.
|
412
|
+
|
413
|
+
If key node $K$ occurs in $N$, new row is inserted into the table $t$. If $K$
|
414
|
+
is not in $N$, but $D(K)$ is not empty, SQL update statement is generated for
|
415
|
+
the row of $t$ with {\tt id} equal to $V(K)$. In both cases, assignments are
|
416
|
+
generated from the data assignment list $D(K)$.
|
417
|
+
|
418
|
+
The above procedure is repeated for each alias $a$ included in $R_\mathrm{A}$.
|
419
|
+
|
420
|
+
\subsection{Iterative assertions}
|
421
|
+
|
422
|
+
If the assertion pattern matches more than once in the site knowledge base,
|
423
|
+
the algorithm defined in this section will nevertheless run the appropriate
|
424
|
+
insertions and updates only once. For iterative update of all occurences of
|
425
|
+
pattern, assertion has to be programmatically wrapped inside an appropriate
|
426
|
+
RDF query.
|
427
|
+
|
428
|
+
|
429
|
+
\section{Implementation Details}
|
430
|
+
|
431
|
+
Samizdat engine\cite{samizdat-impl-report} is written in Ruby programming
|
432
|
+
language and uses PostgreSQL database for storage and an assortment of Ruby
|
433
|
+
libraries for database access (DBI), configuration and RDF mapping (YAML),
|
434
|
+
l10n (GetText), and Pingback protocol (XML-RPC). It is running on a variety of
|
435
|
+
platforms ranging from Debian GNU/Linux to Windows 98/Cygwin. Samizdat is free
|
436
|
+
software and is available under GNU General Public License, version 2 or
|
437
|
+
later.
|
438
|
+
|
439
|
+
Samizdat project development started in December 2002, first public release
|
440
|
+
was announced in June 2003. As of the second beta version 0.5.1, released in
|
441
|
+
March 2004, Samizdat provided basic set of open publishing functionality,
|
442
|
+
including registering site members, publishing and replying to messages,
|
443
|
+
uploading multimedia messages, voting on relation of site focuses to
|
444
|
+
resources, creating and managing new focuses, hand-editing or using GUI for
|
445
|
+
constructing and publishing Squish queries that can be used to search and
|
446
|
+
filter site resources.
|
447
|
+
|
448
|
+
\section{Conclusions}
|
449
|
+
|
450
|
+
Wide adoption of the Semantic Web requires interoperability between relational
|
451
|
+
databases and RDF applications. Existing RDF stores treat relational data as
|
452
|
+
legacy and require that it is recorded in triples before being processed, with
|
453
|
+
the exception of the Federate system that provides limited direct access to
|
454
|
+
relational data via application-specific RDF schema.
|
455
|
+
|
456
|
+
The Samizdat RDF storage layer provides an intermediate solution for this
|
457
|
+
problem by combining relational databases with arbitrary RDF meta-data. The
|
458
|
+
described approach allows to take advantage of RDBMS transactions,
|
459
|
+
replication, performance optimizations, etc., in Semantic Web applications,
|
460
|
+
and reduces the costs of migration from relational data model to RDF.
|
461
|
+
|
462
|
+
As can be seen from corresponding sections of this paper, current
|
463
|
+
implementation of the proposed approach has several limitations. These
|
464
|
+
limitations are not caused by limitations in the approach itself, but rather,
|
465
|
+
reflect the pragmatic decision to only implement the functionality that is
|
466
|
+
used by Samizdat engine. As more advanced collaboration features such as
|
467
|
+
message versioning and aggregation are added to Samizdat, some of the
|
468
|
+
limitations of its RDF storage layer will be removed.
|
469
|
+
|
470
|
+
|
471
|
+
% ---- Bibliography ----
|
472
|
+
%
|
473
|
+
\begin{thebibliography}{19}
|
474
|
+
%
|
475
|
+
\bibitem {ics-volume}
|
476
|
+
Alexaki, S., Christophides, V., Karvounarakis, G., Plexousakis D., Tolle, K.:
|
477
|
+
The RDFSuite: Managing Voluminous RDF Description Bases, Technical report,
|
478
|
+
ICS-FORTH, Heraklion, Greece, 2000.\\
|
479
|
+
http://139.91.183.30:9090/RDF/publications/semweb2001.html
|
480
|
+
|
481
|
+
\bibitem {swad-storage}
|
482
|
+
Beckett, Dave:
|
483
|
+
Semantic Web Scalability and Storage: Survey of Free Software / Open Source
|
484
|
+
RDF storage systems, SWAD-Europe Deliverable 10.1\\
|
485
|
+
http://www.w3.org/2001/sw/Europe/reports/rdf\_scalable\_storage\_report
|
486
|
+
|
487
|
+
\bibitem {swad-rdbms-mapping}
|
488
|
+
Beckett, D., Grant, J.:
|
489
|
+
Semantic Web Scalability and Storage: Mapping Semantic Web Data with RDBMSes,
|
490
|
+
SWAD-Europe Deliverable 10.2\\
|
491
|
+
http://www.w3.org/2001/sw/Europe/reports/scalable\_rdbms\_mapping\_report
|
492
|
+
|
493
|
+
\bibitem{yaml}
|
494
|
+
Ben-Kiki, O., Evans, C., Ingerson, B.:
|
495
|
+
YAML Ain't Markup Language (YAML) 1.0. Working Draft 2004-JAN-29.\\
|
496
|
+
http://www.yaml.org/spec/
|
497
|
+
|
498
|
+
\bibitem {notation3}
|
499
|
+
Berners-Lee, Tim:
|
500
|
+
Notation3 --- Ideas about Web architecture\\
|
501
|
+
http://www.w3.org/DesignIssues/Notation3
|
502
|
+
|
503
|
+
\bibitem {d2r}
|
504
|
+
Bizer, Chris:
|
505
|
+
D2R MAP --- Database to RDF Mapping Language and Processor\\
|
506
|
+
http://www.wiwiss.fu-berlin.de/suhl/bizer/d2rmap/D2Rmap.htm
|
507
|
+
|
508
|
+
\bibitem {samizdat-rdf-storage}
|
509
|
+
Borodaenko, Dmitry:
|
510
|
+
Samizdat RDF Storage, December 2002\\
|
511
|
+
http://savannah.nongnu.org/cgi-bin/viewcvs/samizdat/samizdat/doc/rdf-storage.txt
|
512
|
+
|
513
|
+
\bibitem {samizdat-impl-report}
|
514
|
+
Borodaenko, Dmitry:
|
515
|
+
Samizdat RDF Implementation Report, September 2003\\
|
516
|
+
http://lists.w3.org/Archives/Public/www-rdf-interest/2003Sep/0043.html
|
517
|
+
|
518
|
+
\bibitem {rdf-mt}
|
519
|
+
Hayes, Patrick:
|
520
|
+
RDF Semantics. W3C, February 2004\\
|
521
|
+
http://www.w3.org/TR/rdf-mt
|
522
|
+
|
523
|
+
\bibitem {rdql}
|
524
|
+
Jena Semantic Web Framework:
|
525
|
+
RDQL Grammar\\
|
526
|
+
http://jena.sf.net/RDQL/rdql\_grammar.html
|
527
|
+
|
528
|
+
\bibitem {ericp-rdf-rdb-access}
|
529
|
+
Prud'hommeaux, Eric:
|
530
|
+
RDF Access to Relational Databases\\
|
531
|
+
http://www.w3.org/2003/01/21-RDF-RDB-access/
|
532
|
+
|
533
|
+
\bibitem {rssdb}
|
534
|
+
RSSDB --- RDF Schema Specific DataBase (RSSDB), ICS-FORTH, 2002\\
|
535
|
+
http://139.91.183.30:9090/RDF/RSSDB/
|
536
|
+
|
537
|
+
\bibitem {squish}
|
538
|
+
Libby Miller, Andy Seaborne, Alberto Reggiori:
|
539
|
+
Three Implementations of SquishQL, a Simple RDF Query Language. 1st
|
540
|
+
International Semantic Web Conference (ISWC2002), June 9-12, 2002. Sardinia,
|
541
|
+
Italy.\\
|
542
|
+
http://ilrt.org/discovery/2001/02/squish/
|
543
|
+
|
544
|
+
\end{thebibliography}
|
545
|
+
\end{document}
|