graffiti 2.1

Sign up to get free protection for your applications and to get access to all the features.
@@ -0,0 +1,126 @@
1
+ Samizdat RDF Implementation Report
2
+ ==================================
3
+
4
+ http://lists.w3.org/Archives/Public/www-rdf-interest/2003Sep/0043.html
5
+
6
+ Implementation
7
+ --------------
8
+
9
+ http://www.nongnu.org/samizdat/
10
+
11
+ Samizdat is a generic RDF-based engine for building collaboration and
12
+ open publishing web sites. Samizdat will let everyone publish, view,
13
+ comment, edit, and aggregate text and multimedia resources, vote on
14
+ ratings and classifications, filter resources by flexible sets of
15
+ criteria, cooperate and coordinate on all kinds of activities (see
16
+ Design Goals document). Samizdat intends to promote values of freedom,
17
+ openness, equality, and cooperation.
18
+
19
+ Samizdat engine is implemented using Ruby programming language, Apache
20
+ mod_ruby module, and PostgreSQL RDBMS, and is available under the GNU
21
+ General Public License, version 2 or later.
22
+
23
+ Project development started in December 2002, first public release was
24
+ announced in June 2003. This report refers to the Samizdat 0.0.4,
25
+ released on 2003-09-01.
26
+
27
+ Functionality covered by this version includes: registering site
28
+ members, publishing and replying to messages, uploading multimedia
29
+ messages, voting on standard tags on resources; hand-editing or using
30
+ GUI for constructing and publishing Squish queries that can be used to
31
+ search and filter site resources.
32
+
33
+
34
+ RDF Schema
35
+ ----------
36
+
37
+ Samizdat defines its own RDF schema for description of site members,
38
+ published messages, votes, and other site resources (see Concepts
39
+ document). One of the outstanding features of Samizdat schema is the use
40
+ of statement reification in approval of content classification with
41
+ votes cast by site members.
42
+
43
+ Samizdat RDF schema uses Dublin Core metadata where applicable; also,
44
+ integration of site member descriptions with FOAF is planned.
45
+
46
+ One of the problems encountered in Samizdat RDF Schema development was
47
+ the lack of standard metadata describing discussion threads. While other
48
+ properties defined in Samizdat schema denote Samizdat-specific concepts,
49
+ such as "vote" and "rating", it is more desirable to use commonly agreed
50
+ metadata for threading structure in place of implementation-local
51
+ "thread" and "inReplyTo" properties.
52
+
53
+
54
+ RDF Import and Export
55
+ ---------------------
56
+
57
+ While Samizdat model follows RDF Concepts and RDF Semantics
58
+ recommendations (with the exceptions put down below), the engine does
59
+ not externally interchange RDF data and thus does not use RDF/XML or
60
+ other RDF serialization format. It is assumed that, when the need for
61
+ RDF import and export arises, it can be implemented externally on top of
62
+ the Samizdat RDF storage module and using existing RDF frameworks such
63
+ as Redland.
64
+
65
+
66
+ Datatyped Literals
67
+ ------------------
68
+
69
+ Samizdat doesn't implement datatyped literals, and relies on underlying
70
+ PostgreSQL capabilities for mapping between literal values and their
71
+ string representations. Outside of SQL context, literals are interpreted
72
+ as opaque strings; XML literals are not treated specially, and datatype
73
+ information is not preserved.
74
+
75
+ However, support of XML schema datatypes is considered necessary in
76
+ order to untie a Samizdat knowledge base from specifics of underlying
77
+ RDF storage, and will be implemented as a prerequisite for migration to
78
+ a selection of alternative RDF storage backends (candidates are FramerD,
79
+ 3store, and Redland).
80
+
81
+
82
+ Language Tags
83
+ -------------
84
+
85
+ Literal language tags are not honoured, "dc:language" property is
86
+ supposed to be used to denote message language.
87
+
88
+
89
+ Entailments
90
+ -----------
91
+
92
+ Samizdat RDF storage only implements simple entailment, vocabulary
93
+ entailment is not implemented yet. At the moment, simple entailment
94
+ suffices for all features of the Samizdat engine. If and when vocabulary
95
+ entailment becomes necessary, it will be implemented in Samizdat RDF
96
+ storage module or relegated to an alternative RDF storage backend,
97
+ depending on status of backend alternatives for Samizdat at that time.
98
+
99
+
100
+ Query Support
101
+ -------------
102
+
103
+ Samizdat RDF storage implements a translation of RDF query graphs
104
+ written in extended Squish into relational SQL queries and allows purely
105
+ relational representation of selected properties of site resources (see
106
+ RDF Storage and Storage Implementation documents).
107
+
108
+ It must be noted that at the moment, status of RDF query language
109
+ standards is found unsatisfactory.
110
+
111
+ DAML Query Language abstract specification provides excellent formal
112
+ basis, but does not encompass all capabilities of existing RDF query
113
+ languages. Also, existing query languages are limited in one way or
114
+ another, are underformalized (most are defined by single
115
+ implementation), and often overloaded with baroque syntax.
116
+
117
+ Two major features that were missed the most in existing query languages
118
+ at the time of Samizdat RDF storage implementation were: knowledge base
119
+ update allowing to merge complex constructs into the site KB graph
120
+ (implemented in Samizdat RDF Data Manipulation Language), and workflow
121
+ control providing at least transaction rollback (in Samizdat, underlying
122
+ PostgreSQL transactions are used). Other Squish extensions implemented
123
+ in Samizdat are literal conditions and answer collection ordering
124
+ (currently, relegated to PostgreSQL; ideally, interpreted according to
125
+ literal datatypes).
126
+
data/graffiti.gemspec ADDED
@@ -0,0 +1,21 @@
1
+ Gem::Specification.new do |spec|
2
+ spec.name = 'graffiti'
3
+ spec.version = '2.1'
4
+ spec.author = 'Dmitry Borodaenko'
5
+ spec.email = 'angdraug@debian.org'
6
+ spec.homepage = 'https://github.com/angdraug/graffiti'
7
+ spec.summary = 'Relational RDF store for Ruby'
8
+ spec.description = <<-EOF
9
+ Graffiti is an RDF store based on dynamic translation of RDF queries into SQL.
10
+ Graffiti allows one to map any relational database schema into RDF semantics
11
+ and vice versa, to store any RDF data in a relational database.
12
+
13
+ Graffiti uses Sequel to connect to database backend and provides a DBI-like
14
+ interface to run RDF queries in Squish query language from Ruby applications.
15
+ EOF
16
+ spec.files = `git ls-files`.split "\n"
17
+ spec.test_files = Dir['test/ts_*.rb']
18
+ spec.license = 'GPL3+'
19
+ spec.add_dependency('syncache')
20
+ spec.add_dependency('sequel')
21
+ end
data/lib/graffiti.rb ADDED
@@ -0,0 +1,15 @@
1
+ # Graffiti RDF Store
2
+ # (originally written for Samizdat project)
3
+ #
4
+ # Copyright (c) 2002-2009 Dmitry Borodaenko <angdraug@debian.org>
5
+ #
6
+ # This program is free software.
7
+ # You can distribute/modify this program under the terms of
8
+ # the GNU General Public License version 3 or later.
9
+ #
10
+ # see doc/rdf-storage.txt for introduction and Graffiti Squish definition;
11
+ # see doc/storage-impl.txt for explanation of implemented algorithms
12
+ #
13
+ # vim: et sw=2 sts=2 ts=8 tw=0
14
+
15
+ require 'graffiti/store'
@@ -0,0 +1,34 @@
1
+ # Graffiti RDF Store
2
+ # (originally written for Samizdat project)
3
+ #
4
+ # Copyright (c) 2002-2011 Dmitry Borodaenko <angdraug@debian.org>
5
+ #
6
+ # This program is free software.
7
+ # You can distribute/modify this program under the terms of
8
+ # the GNU General Public License version 3 or later.
9
+ #
10
+ # see doc/rdf-storage.txt for introduction and Graffiti Squish definition;
11
+ # see doc/storage-impl.txt for explanation of implemented algorithms
12
+ #
13
+ # vim: et sw=2 sts=2 ts=8 tw=0
14
+
15
+ module Graffiti
16
+
17
+ module Debug
18
+ private
19
+
20
+ DEBUG = false
21
+
22
+ def debug(message = nil)
23
+ return unless DEBUG
24
+
25
+ log message if message
26
+ log yield if block_given?
27
+ end
28
+
29
+ def log(message)
30
+ STDERR << 'Graffiti: ' << message.to_s << "\n"
31
+ end
32
+ end
33
+
34
+ end
@@ -0,0 +1,20 @@
1
+ # Graffiti RDF Store
2
+ # (originally written for Samizdat project)
3
+ #
4
+ # Copyright (c) 2002-2009 Dmitry Borodaenko <angdraug@debian.org>
5
+ #
6
+ # This program is free software.
7
+ # You can distribute/modify this program under the terms of
8
+ # the GNU General Public License version 3 or later.
9
+ #
10
+ # see doc/rdf-storage.txt for introduction and Graffiti Squish definition;
11
+ # see doc/storage-impl.txt for explanation of implemented algorithms
12
+ #
13
+ # vim: et sw=2 sts=2 ts=8 tw=0
14
+
15
+ module Graffiti
16
+
17
+ # raised for syntax errors in Squish statements
18
+ class ProgrammingError < RuntimeError; end
19
+
20
+ end
@@ -0,0 +1,78 @@
1
+ # Graffiti RDF Store
2
+ # (originally written for Samizdat project)
3
+ #
4
+ # Copyright (c) 2002-2011 Dmitry Borodaenko <angdraug@debian.org>
5
+ #
6
+ # This program is free software.
7
+ # You can distribute/modify this program under the terms of
8
+ # the GNU General Public License version 3 or later.
9
+ #
10
+ # see doc/rdf-storage.txt for introduction and Graffiti Squish definition;
11
+ # see doc/storage-impl.txt for explanation of implemented algorithms
12
+ #
13
+ # vim: et sw=2 sts=2 ts=8 tw=0
14
+
15
+ require 'graffiti/rdf_property_map'
16
+
17
+ module Graffiti
18
+
19
+ # Configuration of relational RDF storage (see examples)
20
+ #
21
+ class RdfConfig
22
+ def initialize(config)
23
+ @ns = config['ns']
24
+
25
+ @map = {}
26
+
27
+ config['map'].each_pair do |p, m|
28
+ table, field = m.to_a.first
29
+ p = ns_expand(p)
30
+ @map[p] = RdfPropertyMap.new(p, table, field)
31
+ end
32
+
33
+ if config['subproperties'].kind_of? Hash
34
+ config['subproperties'].each_pair do |p, subproperties|
35
+ p = ns_expand(p)
36
+ map = @map[p] or raise RuntimeError,
37
+ "Incorrect RDF storage configuration: superproperty #{p} must be mapped"
38
+ map.superproperty = true
39
+
40
+ qualifier = RdfPropertyMap.qualifier_property(p)
41
+ @map[qualifier] = RdfPropertyMap.new(
42
+ qualifier, map.table, RdfPropertyMap.qualifier_field(map.field))
43
+
44
+ subproperties.each do |subp|
45
+ subp = ns_expand(subp)
46
+ @map[subp] = RdfPropertyMap.new(subp, map.table, map.field)
47
+ @map[subp].subproperty_of = p
48
+ end
49
+ end
50
+ end
51
+
52
+ if config['transitive_closure'].kind_of? Hash
53
+ config['transitive_closure'].each_pair do |p, table|
54
+ @map[ ns_expand(p) ].transitive_closure = table
55
+
56
+ if config['subproperties'].kind_of?(Hash) and config['subproperties'][p]
57
+ config['subproperties'][p].each do |subp|
58
+ @map[ ns_expand(subp) ].transitive_closure = table
59
+ end
60
+ end
61
+ end
62
+ end
63
+ end
64
+
65
+ # hash of namespaces
66
+ attr_reader :ns
67
+
68
+ # map internal property names with expanded namespaces to RdfPropertyMap
69
+ # objects
70
+ #
71
+ attr_reader :map
72
+
73
+ def ns_expand(p)
74
+ p and p.sub(/\A(\S+?)::/) { @ns[$1] }
75
+ end
76
+ end
77
+
78
+ end
@@ -0,0 +1,92 @@
1
+ # Graffiti RDF Store
2
+ # (originally written for Samizdat project)
3
+ #
4
+ # Copyright (c) 2002-2011 Dmitry Borodaenko <angdraug@debian.org>
5
+ #
6
+ # This program is free software.
7
+ # You can distribute/modify this program under the terms of
8
+ # the GNU General Public License version 3 or later.
9
+ #
10
+ # see doc/rdf-storage.txt for introduction and Graffiti Squish definition;
11
+ # see doc/storage-impl.txt for explanation of implemented algorithms
12
+ #
13
+ # vim: et sw=2 sts=2 ts=8 tw=0
14
+
15
+ module Graffiti
16
+
17
+ # Map of an internal RDF property into relational storage
18
+ #
19
+ class RdfPropertyMap
20
+
21
+ # special qualifier map
22
+ #
23
+ # ' ' is added to the property name to make sure it can't clash with any
24
+ # valid property uriref
25
+ #
26
+ def RdfPropertyMap.qualifier_property(property, type = 'subproperty')
27
+ property + ' ' + type
28
+ end
29
+
30
+ # special qualifier field
31
+ #
32
+ def RdfPropertyMap.qualifier_field(field, type = 'subproperty')
33
+ field + '_' + type
34
+ end
35
+
36
+ def initialize(property, table, field)
37
+ # fixme: support ambiguous mappings
38
+ @property = property
39
+ @table = table
40
+ @field = field
41
+ end
42
+
43
+ # expanded uriref of the mapped property
44
+ #
45
+ attr_reader :property
46
+
47
+ # name of the table into which the property is mapped (property domain is an
48
+ # internal resource class mapped into this table)
49
+ #
50
+ attr_reader :table
51
+
52
+ # name of the field into which the property is mapped
53
+ #
54
+ # if property range is not a literal, the field is a reference to the
55
+ # resource table
56
+ #
57
+ attr_reader :field
58
+
59
+ # expanded uriref of the property which this property is a subproperty of
60
+ #
61
+ # if set, this property maps into the same table and field as its
62
+ # superproperty, and is qualified by an additional field named
63
+ # <field>_subproperty which refers to a uriref resource holding uriref of
64
+ # this subproperty
65
+ #
66
+ attr_accessor :subproperty_of
67
+
68
+ attr_writer :superproperty
69
+
70
+ # set to +true+ if this property has subproperties
71
+ #
72
+ def superproperty?
73
+ @superproperty or false
74
+ end
75
+
76
+ # name of transitive closure table for a transitive property
77
+ #
78
+ # the format of a transitive closure table is:
79
+ #
80
+ # - 'resource' field refers to the subject resource id
81
+ # - '<field>' property field and '<field>_subproperty' qualifier field (in
82
+ # case of subproperty) have the same name as in the main table
83
+ # - 'distance' field holds the distance from subject to object in the RDF
84
+ # graph
85
+ #
86
+ # the transitive closure table is automatically updated by a trigger on every
87
+ # update of the main table
88
+ #
89
+ attr_accessor :transitive_closure
90
+ end
91
+
92
+ end
@@ -0,0 +1,916 @@
1
+ # Graffiti RDF Store
2
+ # (originally written for Samizdat project)
3
+ #
4
+ # Copyright (c) 2002-2011 Dmitry Borodaenko <angdraug@debian.org>
5
+ #
6
+ # This program is free software.
7
+ # You can distribute/modify this program under the terms of
8
+ # the GNU General Public License version 3 or later.
9
+ #
10
+ # see doc/rdf-storage.txt for introduction and Graffiti Squish definition;
11
+ # see doc/storage-impl.txt for explanation of implemented algorithms
12
+ #
13
+ # vim: et sw=2 sts=2 ts=8 tw=0
14
+
15
+ require 'delegate'
16
+ require 'uri/common'
17
+ require 'graffiti/rdf_property_map'
18
+ require 'graffiti/squish'
19
+
20
+ module Graffiti
21
+
22
+ class SqlNodeBinding
23
+ def initialize(table_alias, field)
24
+ @alias = table_alias
25
+ @field = field
26
+ end
27
+
28
+ attr_reader :alias, :field
29
+
30
+ def to_s
31
+ @alias + '.' + @field
32
+ end
33
+
34
+ alias :inspect :to_s
35
+
36
+ def eql?(binding)
37
+ @alias == binding.alias and @field == binding.field
38
+ end
39
+
40
+ alias :'==' :eql?
41
+
42
+ def hash
43
+ self.to_s.hash
44
+ end
45
+ end
46
+
47
+
48
+ class SqlExpression < DelegateClass(Array)
49
+ def initialize(*parts)
50
+ super parts
51
+ end
52
+
53
+ def to_s
54
+ '(' << self.join(' ') << ')'
55
+ end
56
+
57
+ alias :to_str :to_s
58
+
59
+ def traverse(&block)
60
+ self.each do |part|
61
+ case part
62
+ when SqlExpression
63
+ part.traverse(&block)
64
+ else
65
+ yield
66
+ end
67
+ end
68
+ end
69
+
70
+ def rebind!(rebind, &block)
71
+ self.each_with_index do |part, i|
72
+ case part
73
+ when SqlExpression
74
+ part.rebind!(rebind, &block)
75
+ when SqlNodeBinding
76
+ if rebind[part]
77
+ self[i] = rebind[part]
78
+ yield part if block_given?
79
+ end
80
+ end
81
+ end
82
+ end
83
+
84
+ alias :eql? :'=='
85
+
86
+ def hash
87
+ self.to_s.hash
88
+ end
89
+ end
90
+
91
+
92
+ # Transform RDF query pattern graph into a relational join expression.
93
+ #
94
+ class SqlMapper
95
+ include Debug
96
+
97
+ def initialize(config, pattern, negative = [], optional = [], global_filter = '')
98
+ @config = config
99
+ @global_filter = global_filter
100
+
101
+ check_graph(pattern)
102
+ negative.empty? or check_graph(pattern + negative)
103
+ optional.empty? or check_graph(pattern + optional)
104
+
105
+ map_predicates(pattern, negative, optional)
106
+ transform
107
+ generate_tables_and_conditions
108
+
109
+ @jc = @aliases = @ac = @global_filter = nil
110
+ end
111
+
112
+ # map clause position to table, field, and table alias
113
+ #
114
+ # position => {
115
+ # :subject => {
116
+ # :node => node,
117
+ # :field => field
118
+ # },
119
+ # :object => {
120
+ # :node => node,
121
+ # :field => field
122
+ # },
123
+ # :map => RdfPropertyMap,
124
+ # :bind_mode => < :must_bind | :may_bind | :must_not_bind >,
125
+ # :alias => alias
126
+ # }
127
+ #
128
+ attr_reader :clauses
129
+
130
+ # map node to list of positions in clauses
131
+ #
132
+ # node => {
133
+ # :positions => [
134
+ # { :clause => position, :role => < :subject | :object > }
135
+ # ],
136
+ # :bind_mode => < :must_bind | :may_bind | :must_not_bind >,
137
+ # :colors => { color1 => bind_mode1, ... },
138
+ # :ground => < true | false >
139
+ # }
140
+ #
141
+ attr_reader :nodes
142
+
143
+ # list of tables for FROM clause of SQL query
144
+ attr_reader :from
145
+
146
+ # conditions for WHERE clause of SQL query
147
+ attr_reader :where
148
+
149
+ # return node's binding, raise exception if the node isn't bound
150
+ #
151
+ def bind(node)
152
+ (@nodes[node] and @bindings[node] and (binding = @bindings[node].first)
153
+ ) or raise ProgrammingError,
154
+ "Node '#{node}' is not bound by the query pattern"
155
+
156
+ @nodes[node][:positions].each do |p|
157
+ if :object == p[:role] and @clauses[ p[:clause] ][:map].subproperty_of
158
+
159
+ property = @clauses[ p[:clause] ][:map].property
160
+ return %{select_subproperty(#{binding}, #{bind(property)})}
161
+ end
162
+ end
163
+
164
+ binding
165
+ end
166
+
167
+ private
168
+
169
+ # Check whether pattern is not a disjoint graph (all nodes are
170
+ # undirectionally reachable from one node).
171
+ #
172
+ def check_graph(pattern)
173
+ nodes = pattern.transpose[1, 2].flatten.uniq # all nodes
174
+
175
+ seen = [ nodes.shift ]
176
+ found_more = true
177
+
178
+ while found_more and not nodes.empty?
179
+ found_more = false
180
+
181
+ pattern.each do |predicate, subject, object|
182
+
183
+ if seen.include?(subject) and nodes.include?(object)
184
+ seen.push(object)
185
+ nodes.delete(object)
186
+ found_more = true
187
+
188
+ elsif seen.include?(object) and nodes.include?(subject)
189
+ seen.push(subject)
190
+ nodes.delete(subject)
191
+ found_more = true
192
+ end
193
+ end
194
+ end
195
+
196
+ nodes.empty? or raise ProgrammingError, "Query pattern is a disjoint graph"
197
+ end
198
+
199
+ # Stage 1: Predicate Mapping (storage-impl.txt).
200
+ #
201
+ def map_predicates(pattern, negative, optional)
202
+ @nodes = {}
203
+ @clauses = []
204
+
205
+ map_pattern(pattern, :must_bind)
206
+ map_pattern(negative, :must_not_bind)
207
+ map_pattern(optional, :may_bind)
208
+
209
+ @color_counter = @must_bind_nodes = nil
210
+
211
+ refine_ambiguous_properties
212
+
213
+ debug do
214
+ @nodes.each do |node, n|
215
+ debug %{#{node}: #{n[:bind_mode]} #{n[:colors].inspect}}
216
+ end
217
+ end
218
+ end
219
+
220
+ # Label every connected component of the pattern with a different color.
221
+ #
222
+ # Pattern clause positions:
223
+ #
224
+ # 0. predicate
225
+ # 1. subject
226
+ # 2. object
227
+ # 3. filter
228
+ #
229
+ # Returns hash of node colors.
230
+ #
231
+ # Implements the {Two-pass Connected Component Labeling algorithm}
232
+ # [http://en.wikipedia.org/wiki/Connected_Component_Labeling#Two-pass]
233
+ # with an added special case to exclude _alien_nodes_ from neighbor lists.
234
+ #
235
+ # The special case ensures that parts of a may-bind or must-not-bind
236
+ # subpattern that are only connected through a must-bind node do not connect.
237
+ #
238
+ def label_pattern_components(pattern, alien_nodes, augment_alien_nodes = false)
239
+ return {} if pattern.empty?
240
+
241
+ color = {}
242
+ color_eq = [] # [ [ smaller, larger ], ... ]
243
+ nodes = pattern.transpose[1, 2].flatten.uniq
244
+ alien_nodes_here = nodes & alien_nodes
245
+
246
+ @color_counter = @color_counter ? @color_counter.next : 0
247
+ color[ nodes[0] ] = @color_counter
248
+
249
+ # first pass
250
+ 1.upto(nodes.size - 1) do |i|
251
+ node = nodes[i]
252
+
253
+ pattern.each do |predicate, subject, object, filter|
254
+ if node == subject
255
+ neighbor = object
256
+ elsif node == object
257
+ neighbor = subject
258
+ end
259
+ next if neighbor.nil? or color[neighbor].nil? or
260
+ alien_nodes_here.include?(neighbor)
261
+
262
+ if color[node].nil?
263
+ color[node] = color[neighbor]
264
+ elsif color[node] != color[neighbor] # record color equivalence
265
+ color_eq |= [ [ color[node], color[neighbor] ].sort ]
266
+ end
267
+ end
268
+
269
+ color[node] ||= (@color_counter += 1)
270
+ end
271
+
272
+ # second pass
273
+ nodes.each do |node|
274
+ while eq = color_eq.rassoc(color[node])
275
+ color[node] = eq[0]
276
+ end
277
+ end
278
+
279
+ alien_nodes.push(*nodes).uniq! if augment_alien_nodes
280
+
281
+ color
282
+ end
283
+
284
+ def map_pattern(pattern, bind_mode = :must_bind)
285
+ pattern = pattern.dup
286
+ @must_bind_nodes ||= []
287
+ color = label_pattern_components(pattern, @must_bind_nodes, :must_bind == bind_mode)
288
+
289
+ pattern.each do |predicate, subject, object, filter, transitive|
290
+
291
+ # validate the triple
292
+ predicate =~ URI::URI_REF or raise ProgrammingError,
293
+ "Valid uriref expected in predicate position instead of '#{predicate}'"
294
+
295
+ [subject, object].each do |node|
296
+ node =~ SquishQuery::INTERNAL or
297
+ node =~ SquishQuery::BN or
298
+ node =~ URI::URI_REF or
299
+ raise ProgrammingError,
300
+ "Resource or blank node name expected instead of '#{node}'"
301
+ end
302
+
303
+ # list of possible mappings into internal tables
304
+ map = @config.map[predicate]
305
+
306
+ if transitive and map.transitive_closure.nil?
307
+ raise ProgrammingError,
308
+ "No transitive closure is defined for #{predicate} property"
309
+ end
310
+
311
+ if map and
312
+ (subject =~ SquishQuery::BN or
313
+ subject =~ SquishQuery::INTERNAL or
314
+ subject =~ SquishQuery::PARAMETER or
315
+ 'resource' == map.table)
316
+ # internal predicate and subject is mappable to resource table
317
+
318
+ i = clauses.size
319
+
320
+ @clauses[i] = {
321
+ :subject => [ { :node => subject, :field => 'id' } ],
322
+ :object => [ { :node => object, :field => map.field } ],
323
+ :map => map,
324
+ :transitive => transitive,
325
+ :bind_mode => bind_mode
326
+ }
327
+ @clauses[i][:filter] = SqlExpression.new(filter) if filter
328
+
329
+ [subject, object].each do |node|
330
+ if @nodes[node]
331
+ @nodes[node][:bind_mode] =
332
+ stronger_bind_mode(@nodes[node][:bind_mode], bind_mode)
333
+ else
334
+ @nodes[node] = { :positions => [], :bind_mode => bind_mode, :colors => {} }
335
+ end
336
+
337
+ # set of node colors, one for each bind_mode
338
+ @nodes[node][:colors][ color[node] ] = bind_mode
339
+ end
340
+
341
+ # reverse mapping of the node occurences
342
+ @nodes[subject][:positions].push( { :clause => i, :role => :subject } )
343
+ @nodes[object][:positions].push( { :clause => i, :role => :object } )
344
+
345
+ if superp = map.subproperty_of
346
+ # link subproperty qualifier into the pattern
347
+ pattern.push(
348
+ [RdfPropertyMap.qualifier_property(superp), subject, predicate])
349
+ color[predicate] = color[object]
350
+
351
+ # no need to ground both subproperty and superproperty
352
+ @nodes[object][:ground] = true
353
+ end
354
+
355
+ else
356
+ # assume reification for unmapped predicates:
357
+ #
358
+ # | (rdf::predicate ?_stmt_#{i} p)
359
+ # (p s o) -> | (rdf::subject ?_stmt_#{i} s)
360
+ # | (rdf::object ?_stmt_#{i} o)
361
+ #
362
+ rdf = 'http://www.w3.org/1999/02/22-rdf-syntax-ns#'
363
+ stmt = "?_stmt_#{i}"
364
+ pattern.push([rdf + 'predicate', stmt, predicate],
365
+ [rdf + 'subject', stmt, subject],
366
+ [rdf + 'object', stmt, object])
367
+ color[stmt] = color[predicate] = color[object]
368
+ end
369
+ end
370
+ end
371
+
372
+ # Select strongest of the two bind modes, in the following order of
373
+ # preference:
374
+ #
375
+ # :must_bind -> :must_not_bind -> :may_bind
376
+ #
377
+ def stronger_bind_mode(mode1, mode2)
378
+ if mode1 != mode2 and (:must_bind == mode2 or :may_bind == mode1)
379
+ mode2
380
+ else
381
+ mode1
382
+ end
383
+ end
384
+
385
+ # If a node can be mapped to more than one [table, field] pair, see if it can
386
+ # be refined based on other occurences of this node in other query clauses.
387
+ #
388
+ def refine_ambiguous_properties
389
+ @nodes.each_value do |n|
390
+ map = n[:positions]
391
+
392
+ map.each_with_index do |p, i|
393
+ big = @clauses[ p[:clause] ][ p[:role] ]
394
+ next if big.size <= 1 # no refining needed
395
+
396
+ debug { n + ': ' + big.inspect }
397
+
398
+ (i + 1).upto(map.size - 1) do |j|
399
+ small_p = map[j]
400
+ small = @clauses[ small_p[:clause] ][ small_p[:role] ]
401
+
402
+ refined = big & small
403
+ if refined.size > 0 and refined.size < big.size
404
+
405
+ # refine the node...
406
+ @clauses[ p[:clause] ][ p[:role] ] = big = refined
407
+
408
+ # ...and its pair
409
+ @clauses[ p[:clause] ][ opposite_role(p[:role]) ].collect! {|pair|
410
+ refined.assoc(pair[0]) ? pair : nil
411
+ }.compact!
412
+ end
413
+ end
414
+ end
415
+ end
416
+
417
+ # drop remaining ambiguous mappings
418
+ # todo: split query for ambiguous mappings
419
+ @clauses.each do |clause|
420
+ next if clause.nil? # means it was reified
421
+ clause[:subject] = clause[:subject].first
422
+ clause[:object] = clause[:object].first
423
+ end
424
+ end
425
+
426
+ def opposite_role(role)
427
+ :subject == role ? :object : :subject
428
+ end
429
+
430
+ # Return current value of alias counter, remember which table it was assigned
431
+ # to, and increment the counter.
432
+ #
433
+ def next_alias(table, node, bind_mode = @nodes[node][:bind_mode])
434
+ @ac ||= 'a'
435
+ @aliases ||= {}
436
+
437
+ a = @ac.dup
438
+ @aliases[a] = {
439
+ :table => table,
440
+ :node => node,
441
+ :bind_mode => bind_mode,
442
+ :filter => []
443
+ }
444
+
445
+ @ac.next!
446
+ return a
447
+ end
448
+
449
+ def define_relation_aliases
450
+ @nodes.each do |node, n|
451
+
452
+ positions = n[:positions]
453
+
454
+ # go through all clauses with this node in subject position
455
+ positions.each_with_index do |p, i|
456
+ next if :subject != p[:role] or @clauses[ p[:clause] ][:alias]
457
+
458
+ clause = @clauses[ p[:clause] ]
459
+ map = clause[:map]
460
+ table = clause[:transitive] ? map.transitive_closure : map.table
461
+
462
+ # see if we've already mapped this node to the same table before
463
+ 0.upto(i - 1) do |j|
464
+ similar_clause = @clauses[ positions[j][:clause] ]
465
+
466
+ if similar_clause[:alias] and
467
+ similar_clause[:map].table == table and
468
+ similar_clause[:map].field != map.field
469
+ # same node, same table, different field -> same alias
470
+
471
+ clause[:alias] = similar_clause[:alias]
472
+ break
473
+ end
474
+ end
475
+
476
+ if clause[:alias].nil?
477
+ clause[:alias] =
478
+ if clause[:transitive]
479
+ # transitive clause bind mode overrides a stronger node bind mode
480
+ #
481
+ # fixme: generic case for multiple aliases per node
482
+ next_alias(table, node, clause[:bind_mode])
483
+ else
484
+ next_alias(table, node)
485
+ end
486
+ end
487
+ end
488
+ end # optimize: unnecessary aliases are generated
489
+ end
490
+
491
+ def update_alias_filters
492
+ @clauses.each do |c|
493
+ if c[:filter]
494
+ @aliases[ c[:alias] ][:filter].push(c[:filter])
495
+ end
496
+ end
497
+ end
498
+
499
+ # Stage 2: Relation Aliases and Join Conditions (storage-impl.txt).
500
+ #
501
+ # Result is map of aliases in @aliases and list of join conditions in @jc.
502
+ #
503
+ def transform
504
+ define_relation_aliases
505
+ update_alias_filters
506
+
507
+ # [ [ binding1, binding2 ], ... ]
508
+ @jc = []
509
+ @bindings = {}
510
+
511
+ @nodes.each do |node, n|
512
+ positions = n[:positions]
513
+
514
+ # node binding
515
+ first = positions.first
516
+ clause = @clauses[ first[:clause] ]
517
+ a = clause[:alias]
518
+ binding = SqlNodeBinding.new(a, clause[ first[:role] ][:field])
519
+ @bindings[node] = [ binding ]
520
+
521
+ # join conditions
522
+ 1.upto(positions.size - 1) do |i|
523
+ p = positions[i]
524
+ clause2 = @clauses[ p[:clause] ]
525
+ binding2 = SqlNodeBinding.new(clause2[:alias], clause2[ p[:role] ][:field])
526
+
527
+ unless @bindings[node].include?(binding2)
528
+ @bindings[node].push(binding2)
529
+ @jc.push([binding, binding2, node])
530
+ n[:ground] = true
531
+ end
532
+ end
533
+
534
+ # ground non-blank nodes
535
+ if node !~ SquishQuery::BN
536
+
537
+ if node =~ SquishQuery::INTERNAL # internal resource id
538
+ @aliases[a][:filter].push SqlExpression.new(binding, '=', $1)
539
+
540
+ elsif node =~ SquishQuery::PARAMETER or node =~ SquishQuery::LITERAL
541
+ @aliases[a][:filter].push SqlExpression.new(binding, '=', node)
542
+
543
+ elsif node =~ URI::URI_REF # external resource uriref
544
+
545
+ r = nil
546
+ positions.each do |p|
547
+ next unless :subject == p[:role]
548
+
549
+ c = @clauses[ p[:clause] ]
550
+ if 'resource' == c[:map].table
551
+ r = c[:alias] # reuse existing mapping to resource table
552
+ break
553
+ end
554
+ end
555
+
556
+ if r.nil?
557
+ r = next_alias('resource', node)
558
+ r_binding = SqlNodeBinding.new(r, 'id')
559
+ @bindings[node].unshift(r_binding)
560
+ @jc.push([ binding, r_binding, node ])
561
+ end
562
+
563
+ @aliases[r][:filter].push SqlExpression.new(
564
+ SqlNodeBinding.new(r, 'uriref'), '=', "'t'", 'AND',
565
+ SqlNodeBinding.new(r, 'label'), '=', %{'#{node}'})
566
+
567
+ else
568
+ raise RuntimeError,
569
+ "Invalid node '#{node}' should never occur at this point"
570
+ end
571
+
572
+ n[:ground] = true
573
+ end
574
+ end
575
+
576
+ debug do
577
+ @aliases.each {|alias_name, a| debug %{#{alias_name}: #{a.inspect}} }
578
+ @jc.each {|jc| debug jc.inspect }
579
+ end
580
+ end
581
+
582
+ # Produce SQL FROM and WHERE clauses from results of transform().
583
+ #
584
+ def generate_tables_and_conditions
585
+ main_path, seen = jc_subgraph_path(:must_bind)
586
+ debug { main_path.inspect }
587
+
588
+ main_path and not main_path.empty? or raise RuntimeError,
589
+ 'Failed to find table aliases for main query'
590
+
591
+ @where = ground_dangling_blank_nodes(main_path)
592
+
593
+ joins = ''
594
+ subquery_count = 'a'
595
+
596
+ [ :must_not_bind, :may_bind ].each do |bind_mode|
597
+ loop do
598
+ sub_path, new = jc_subgraph_path(bind_mode, seen)
599
+ break if sub_path.nil? or sub_path.empty?
600
+
601
+ debug { sub_path.inspect }
602
+
603
+ sub_query, sub_join = sub_path.partition {|a,| main_path.assoc(a).nil? }
604
+ # fixme: make sure that sub_join is not empty
605
+
606
+ if 1 == sub_query.size
607
+ # simplified case: join single table directly without a subquery
608
+ join_alias, = sub_query.first
609
+ a = @aliases[join_alias]
610
+ join_target = a[:table]
611
+ join_conditions = jc_path_to_join_conditions(sub_join) + a[:filter]
612
+
613
+ else
614
+ # left join subquery to the main query
615
+ join_alias = '_subquery_' << subquery_count
616
+ subquery_count.next!
617
+
618
+ sub_join = subquery_jc_path(sub_join, join_alias)
619
+ rebind = rebind_subquery(sub_path, join_alias)
620
+ select_nodes = subquery_select_nodes(rebind, main_path, sub_join)
621
+
622
+ join_conditions = jc_path_to_join_conditions(sub_join, rebind,
623
+ select_nodes)
624
+
625
+ select_nodes = select_nodes.keys.collect {|b|
626
+ b.to_s << ' AS ' << rebind[b].field
627
+ }.join(', ')
628
+
629
+ tables, conditions = jc_path_to_tables_and_conditions(sub_path)
630
+
631
+ join_target = "(\nSELECT #{select_nodes}\nFROM #{tables}"
632
+ join_target << "\nWHERE " << conditions unless conditions.empty?
633
+ join_target << "\n)"
634
+ join_target.gsub!(/\n(?!\)\z)/, "\n ")
635
+ end
636
+
637
+ joins << ("\nLEFT JOIN " + join_target + ' AS ' + join_alias + ' ON ' +
638
+ join_conditions.uniq.join(' AND '))
639
+
640
+ if :must_not_bind == bind_mode
641
+ left_join_is_null(main_path, sub_join)
642
+ end
643
+ end
644
+ end
645
+
646
+ @from, main_where = jc_path_to_tables_and_conditions(main_path)
647
+
648
+ @from << joins
649
+
650
+ @where.push('(' + main_where + ')') unless main_where.empty?
651
+ @where.push('(' + @global_filter + ')') unless @global_filter.empty?
652
+ @where = @where.join("\nAND ")
653
+ end
654
+
655
+ # Produce a subgraph path through join conditions linking all aliases with
656
+ # given _bind_mode_ that form a same-color connected component of the join
657
+ # conditions graph and weren't processed yet:
658
+ #
659
+ # path = [ [start, []], [ next, [ jc, ... ] ], ... ]
660
+ #
661
+ # Update _seen_ hash for all aliases included in the produced path.
662
+ #
663
+ def jc_subgraph_path(bind_mode, seen = {})
664
+ start = find_alias(bind_mode, seen)
665
+ return nil if start.nil?
666
+
667
+ new = {}
668
+ new[start] = true
669
+ path = [ [start, []] ]
670
+ colors = @nodes[ @aliases[start][:node] ][:colors].keys
671
+
672
+ loop do # while we can find more connecting joins of the same color
673
+ join_alias = nil
674
+
675
+ @jc.each do |jc|
676
+ # use cases:
677
+ # - seen is empty (composing the must-bind join)
678
+ # - seen is not empty (composing a subquery)
679
+
680
+ next if (colors & @nodes[ jc[2] ][:colors].keys).empty?
681
+
682
+ 0.upto(1) do |i|
683
+ a_seen = jc[i].alias
684
+ a_next = jc[1-i].alias
685
+
686
+ if not new[a_next] and (
687
+ ((new[a_seen] or seen[a_seen]) and
688
+ (@aliases[a_next][:bind_mode] == bind_mode)
689
+ # connect an untouched node of matching bind mode
690
+ ) or (
691
+ new[a_seen] and seen[a_next] and
692
+ # connect subquery to the rest of the query...
693
+ @aliases[a_seen][:bind_mode] == bind_mode
694
+ # ...but only go one step deep
695
+ ))
696
+
697
+ join_alias = a_next
698
+ break
699
+ end
700
+ end
701
+
702
+ break if join_alias
703
+ end
704
+
705
+ break if join_alias.nil?
706
+
707
+ # join it to all seen aliases
708
+ join_on = @jc.find_all do |jc|
709
+ a1, a2 = jc[0, 2].collect {|b| b.alias }
710
+ (new[a1] and a2 == join_alias) or (new[a2] and a1 == join_alias)
711
+ end
712
+
713
+ new[join_alias] = true
714
+ path.push([join_alias, join_on])
715
+ end
716
+
717
+ seen.merge!(new)
718
+ [ path, new ]
719
+ end
720
+
721
+ def find_alias(bind_mode, seen = {})
722
+ @aliases.each do |alias_name, a|
723
+ next if seen[alias_name] or a[:bind_mode] != bind_mode
724
+ return alias_name
725
+ end
726
+
727
+ nil
728
+ end
729
+
730
+ # Ground all must-bind blank nodes that weren't ground elsewhere to an
731
+ # existential quantifier.
732
+ #
733
+ def ground_dangling_blank_nodes(main_path)
734
+ conditions = []
735
+ ground_nodes = @global_filter.scan(SquishQuery::BN_SCAN)
736
+
737
+ @nodes.each do |node, n|
738
+ next if (n[:ground] or ground_nodes.include?(node))
739
+
740
+ expression =
741
+ case n[:bind_mode]
742
+ when :must_bind
743
+ 'IS NOT NULL'
744
+ when :must_not_bind
745
+ 'IS NULL'
746
+ else
747
+ next
748
+ end
749
+
750
+ @bindings[node].each do |binding|
751
+ if main_path.assoc(binding.alias)
752
+ conditions.push SqlExpression.new(binding, expression)
753
+ break
754
+ end
755
+ end
756
+ end
757
+
758
+ conditions
759
+ end
760
+
761
+ # Join a subquery to the main query: for each alias shared between the two,
762
+ # link 'id' field of the corresponding table within and outside the subquery.
763
+ # If no node is bound to the 'id' field, create a virtual node bound to it,
764
+ # so that it can be rebound by rebind_subquery().
765
+ #
766
+ def subquery_jc_path(sub_join, join_alias)
767
+ sub_join.empty? and raise ProgrammingError,
768
+ "Unexpected empty subquery, check your RDF storage configuration"
769
+ # fixme: reify instead of raising an exception
770
+
771
+ sub_join.transpose[0].uniq.collect do |a|
772
+ binding = SqlNodeBinding.new(a, 'id')
773
+
774
+ exists = false
775
+ @nodes.each do |node, n|
776
+ if @bindings[node].include?(binding)
777
+ exists = true
778
+ break
779
+ end
780
+ end
781
+
782
+ unless exists
783
+ node = '?' + join_alias + '_' + a
784
+ @nodes[node] = { :ground => true }
785
+ @bindings[node] = [ binding ]
786
+ end
787
+
788
+ [ a, [[ binding, binding ]] ]
789
+ end
790
+ end
791
+
792
+ # Generate a hash that maps all bindings that's been wrapped inside the
793
+ # _sub_query_ (a jc path, see jc_subquery_path()) to rebound bindings based
794
+ # on the _join_alias_ so that they may still be used in the main query.
795
+ #
796
+ def rebind_subquery(sub_path, join_alias)
797
+ rebind = {}
798
+ field_count = 'a'
799
+
800
+ wrapped = {}
801
+ sub_path.each {|a,| wrapped[a] = true }
802
+
803
+ @nodes.each do |node, n|
804
+ @bindings[node].each do |b|
805
+ if wrapped[b.alias] and rebind[b].nil?
806
+ field = '_field_' << field_count
807
+ field_count.next!
808
+ rebind[b] = SqlNodeBinding.new(join_alias, field)
809
+ end
810
+ end
811
+ end
812
+
813
+ rebind
814
+ end
815
+
816
+ # Go through global filter, filters in the main query, and join conditions
817
+ # attaching the subquery to the main query, rebind the bindings for nodes
818
+ # wrapped inside the subquery, and return a hash with keys for all bindings
819
+ # that should be selected from the subquery.
820
+ #
821
+ def subquery_select_nodes(rebind, main_path, sub_join)
822
+ select_nodes = {}
823
+
824
+ # update the global filter
825
+ @nodes.each do |node, n|
826
+ if r = rebind[ @bindings[node].first ]
827
+ @global_filter.gsub!(/#{Regexp.escape(node)}\b/) do
828
+ select_nodes[ @bindings[node].first ] = true
829
+ r.to_s
830
+ end
831
+ end
832
+ end
833
+
834
+ # update filters in the main query
835
+ main_path.each do |a,|
836
+ next if sub_join.assoc(a)
837
+
838
+ @aliases[a][:filter].each do |f|
839
+ f.rebind!(rebind) do |b|
840
+ select_nodes[b] = true
841
+ end
842
+ end
843
+ end
844
+
845
+ # update the subquery join path
846
+ sub_join.each do |a, jcs|
847
+ jcs.each do |jc|
848
+ select_nodes[ jc[0] ] = true
849
+ jc[1] = rebind[ jc[1] ]
850
+ end
851
+ end
852
+
853
+ # fixme: update main SELECT list
854
+ select_nodes
855
+ end
856
+
857
+ # Transform jc path (see jc_subgraph_path()) into a list of join conditions.
858
+ # If _rebind_ and _select_nodes_ hashes are defined, conditions will be
859
+ # rebound accordingly, and _select_nodes_ will be updated to include bindings
860
+ # used in the conditions.
861
+ #
862
+ def jc_path_to_join_conditions(jc_path, rebind = nil, select_nodes = nil)
863
+ conditions = []
864
+
865
+ jc_path.each do |a, jcs|
866
+ jcs.each do |b1, b2, n|
867
+ conditions.push SqlExpression.new(b1, '=', b2)
868
+ end
869
+ end
870
+
871
+ conditions.empty? and raise RuntimeError,
872
+ "Failed to join subquery to the main query"
873
+
874
+ conditions
875
+ end
876
+
877
+ # Generate FROM and WHERE clauses from a jc path (see jc_subgraph_path()).
878
+ #
879
+ def jc_path_to_tables_and_conditions(path)
880
+ first, = path[0]
881
+ a = @aliases[first]
882
+
883
+ tables = a[:table] + ' AS ' + first
884
+ conditions = a[:filter]
885
+
886
+ path[1, path.size - 1].each do |join_alias, join_on|
887
+ a = @aliases[join_alias]
888
+
889
+ tables <<
890
+ %{\nINNER JOIN #{a[:table]} AS #{join_alias} ON } <<
891
+ (
892
+ join_on.collect {|b1, b2| SqlExpression.new(b1, '=', b2) } +
893
+ a[:filter]
894
+ ).uniq.join(' AND ')
895
+ end
896
+
897
+ [ tables, conditions.uniq.join("\nAND ") ]
898
+ end
899
+
900
+ # Find and declare as NULL key fields of a must-not-bind subquery.
901
+ #
902
+ def left_join_is_null(main_path, sub_join)
903
+ sub_join.each do |a, jcs|
904
+ jcs.each do |jc|
905
+ 0.upto(1) do |i|
906
+ if main_path.assoc(jc[i].alias).nil?
907
+ @where.push SqlExpression.new(jc[i], 'IS NULL')
908
+ break
909
+ end
910
+ end
911
+ end
912
+ end
913
+ end
914
+ end
915
+
916
+ end