graffiti 2.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -0,0 +1,126 @@
1
+ Samizdat RDF Implementation Report
2
+ ==================================
3
+
4
+ http://lists.w3.org/Archives/Public/www-rdf-interest/2003Sep/0043.html
5
+
6
+ Implementation
7
+ --------------
8
+
9
+ http://www.nongnu.org/samizdat/
10
+
11
+ Samizdat is a generic RDF-based engine for building collaboration and
12
+ open publishing web sites. Samizdat will let everyone publish, view,
13
+ comment, edit, and aggregate text and multimedia resources, vote on
14
+ ratings and classifications, filter resources by flexible sets of
15
+ criteria, cooperate and coordinate on all kinds of activities (see
16
+ Design Goals document). Samizdat intends to promote values of freedom,
17
+ openness, equality, and cooperation.
18
+
19
+ Samizdat engine is implemented using Ruby programming language, Apache
20
+ mod_ruby module, and PostgreSQL RDBMS, and is available under the GNU
21
+ General Public License, version 2 or later.
22
+
23
+ Project development started in December 2002, first public release was
24
+ announced in June 2003. This report refers to the Samizdat 0.0.4,
25
+ released on 2003-09-01.
26
+
27
+ Functionality covered by this version includes: registering site
28
+ members, publishing and replying to messages, uploading multimedia
29
+ messages, voting on standard tags on resources; hand-editing or using
30
+ GUI for constructing and publishing Squish queries that can be used to
31
+ search and filter site resources.
32
+
33
+
34
+ RDF Schema
35
+ ----------
36
+
37
+ Samizdat defines its own RDF schema for description of site members,
38
+ published messages, votes, and other site resources (see Concepts
39
+ document). One of the outstanding features of Samizdat schema is the use
40
+ of statement reification in approval of content classification with
41
+ votes cast by site members.
42
+
43
+ Samizdat RDF schema uses Dublin Core metadata where applicable; also,
44
+ integration of site member descriptions with FOAF is planned.
45
+
46
+ One of the problems encountered in Samizdat RDF Schema development was
47
+ the lack of standard metadata describing discussion threads. While other
48
+ properties defined in Samizdat schema denote Samizdat-specific concepts,
49
+ such as "vote" and "rating", it is more desirable to use commonly agreed
50
+ metadata for threading structure in place of implementation-local
51
+ "thread" and "inReplyTo" properties.
52
+
53
+
54
+ RDF Import and Export
55
+ ---------------------
56
+
57
+ While Samizdat model follows RDF Concepts and RDF Semantics
58
+ recommendations (with the exceptions put down below), the engine does
59
+ not externally interchange RDF data and thus does not use RDF/XML or
60
+ other RDF serialization format. It is assumed that, when the need for
61
+ RDF import and export arises, it can be implemented externally on top of
62
+ the Samizdat RDF storage module and using existing RDF frameworks such
63
+ as Redland.
64
+
65
+
66
+ Datatyped Literals
67
+ ------------------
68
+
69
+ Samizdat doesn't implement datatyped literals, and relies on underlying
70
+ PostgreSQL capabilities for mapping between literal values and their
71
+ string representations. Outside of SQL context, literals are interpreted
72
+ as opaque strings; XML literals are not treated specially, and datatype
73
+ information is not preserved.
74
+
75
+ However, support of XML schema datatypes is considered necessary in
76
+ order to untie a Samizdat knowledge base from specifics of underlying
77
+ RDF storage, and will be implemented as a prerequisite for migration to
78
+ a selection of alternative RDF storage backends (candidates are FramerD,
79
+ 3store, and Redland).
80
+
81
+
82
+ Language Tags
83
+ -------------
84
+
85
+ Literal language tags are not honoured, "dc:language" property is
86
+ supposed to be used to denote message language.
87
+
88
+
89
+ Entailments
90
+ -----------
91
+
92
+ Samizdat RDF storage only implements simple entailment, vocabulary
93
+ entailment is not implemented yet. At the moment, simple entailment
94
+ suffices for all features of the Samizdat engine. If and when vocabulary
95
+ entailment becomes necessary, it will be implemented in Samizdat RDF
96
+ storage module or relegated to an alternative RDF storage backend,
97
+ depending on status of backend alternatives for Samizdat at that time.
98
+
99
+
100
+ Query Support
101
+ -------------
102
+
103
+ Samizdat RDF storage implements a translation of RDF query graphs
104
+ written in extended Squish into relational SQL queries and allows purely
105
+ relational representation of selected properties of site resources (see
106
+ RDF Storage and Storage Implementation documents).
107
+
108
+ It must be noted that at the moment, status of RDF query language
109
+ standards is found unsatisfactory.
110
+
111
+ DAML Query Language abstract specification provides excellent formal
112
+ basis, but does not encompass all capabilities of existing RDF query
113
+ languages. Also, existing query languages are limited in one way or
114
+ another, are underformalized (most are defined by single
115
+ implementation), and often overloaded with baroque syntax.
116
+
117
+ Two major features that were missed the most in existing query languages
118
+ at the time of Samizdat RDF storage implementation were: knowledge base
119
+ update allowing to merge complex constructs into the site KB graph
120
+ (implemented in Samizdat RDF Data Manipulation Language), and workflow
121
+ control providing at least transaction rollback (in Samizdat, underlying
122
+ PostgreSQL transactions are used). Other Squish extensions implemented
123
+ in Samizdat are literal conditions and answer collection ordering
124
+ (currently, relegated to PostgreSQL; ideally, interpreted according to
125
+ literal datatypes).
126
+
data/graffiti.gemspec ADDED
@@ -0,0 +1,21 @@
1
+ Gem::Specification.new do |spec|
2
+ spec.name = 'graffiti'
3
+ spec.version = '2.1'
4
+ spec.author = 'Dmitry Borodaenko'
5
+ spec.email = 'angdraug@debian.org'
6
+ spec.homepage = 'https://github.com/angdraug/graffiti'
7
+ spec.summary = 'Relational RDF store for Ruby'
8
+ spec.description = <<-EOF
9
+ Graffiti is an RDF store based on dynamic translation of RDF queries into SQL.
10
+ Graffiti allows one to map any relational database schema into RDF semantics
11
+ and vice versa, to store any RDF data in a relational database.
12
+
13
+ Graffiti uses Sequel to connect to database backend and provides a DBI-like
14
+ interface to run RDF queries in Squish query language from Ruby applications.
15
+ EOF
16
+ spec.files = `git ls-files`.split "\n"
17
+ spec.test_files = Dir['test/ts_*.rb']
18
+ spec.license = 'GPL3+'
19
+ spec.add_dependency('syncache')
20
+ spec.add_dependency('sequel')
21
+ end
data/lib/graffiti.rb ADDED
@@ -0,0 +1,15 @@
1
+ # Graffiti RDF Store
2
+ # (originally written for Samizdat project)
3
+ #
4
+ # Copyright (c) 2002-2009 Dmitry Borodaenko <angdraug@debian.org>
5
+ #
6
+ # This program is free software.
7
+ # You can distribute/modify this program under the terms of
8
+ # the GNU General Public License version 3 or later.
9
+ #
10
+ # see doc/rdf-storage.txt for introduction and Graffiti Squish definition;
11
+ # see doc/storage-impl.txt for explanation of implemented algorithms
12
+ #
13
+ # vim: et sw=2 sts=2 ts=8 tw=0
14
+
15
+ require 'graffiti/store'
@@ -0,0 +1,34 @@
1
+ # Graffiti RDF Store
2
+ # (originally written for Samizdat project)
3
+ #
4
+ # Copyright (c) 2002-2011 Dmitry Borodaenko <angdraug@debian.org>
5
+ #
6
+ # This program is free software.
7
+ # You can distribute/modify this program under the terms of
8
+ # the GNU General Public License version 3 or later.
9
+ #
10
+ # see doc/rdf-storage.txt for introduction and Graffiti Squish definition;
11
+ # see doc/storage-impl.txt for explanation of implemented algorithms
12
+ #
13
+ # vim: et sw=2 sts=2 ts=8 tw=0
14
+
15
+ module Graffiti
16
+
17
+ module Debug
18
+ private
19
+
20
+ DEBUG = false
21
+
22
+ def debug(message = nil)
23
+ return unless DEBUG
24
+
25
+ log message if message
26
+ log yield if block_given?
27
+ end
28
+
29
+ def log(message)
30
+ STDERR << 'Graffiti: ' << message.to_s << "\n"
31
+ end
32
+ end
33
+
34
+ end
@@ -0,0 +1,20 @@
1
+ # Graffiti RDF Store
2
+ # (originally written for Samizdat project)
3
+ #
4
+ # Copyright (c) 2002-2009 Dmitry Borodaenko <angdraug@debian.org>
5
+ #
6
+ # This program is free software.
7
+ # You can distribute/modify this program under the terms of
8
+ # the GNU General Public License version 3 or later.
9
+ #
10
+ # see doc/rdf-storage.txt for introduction and Graffiti Squish definition;
11
+ # see doc/storage-impl.txt for explanation of implemented algorithms
12
+ #
13
+ # vim: et sw=2 sts=2 ts=8 tw=0
14
+
15
+ module Graffiti
16
+
17
+ # raised for syntax errors in Squish statements
18
+ class ProgrammingError < RuntimeError; end
19
+
20
+ end
@@ -0,0 +1,78 @@
1
+ # Graffiti RDF Store
2
+ # (originally written for Samizdat project)
3
+ #
4
+ # Copyright (c) 2002-2011 Dmitry Borodaenko <angdraug@debian.org>
5
+ #
6
+ # This program is free software.
7
+ # You can distribute/modify this program under the terms of
8
+ # the GNU General Public License version 3 or later.
9
+ #
10
+ # see doc/rdf-storage.txt for introduction and Graffiti Squish definition;
11
+ # see doc/storage-impl.txt for explanation of implemented algorithms
12
+ #
13
+ # vim: et sw=2 sts=2 ts=8 tw=0
14
+
15
+ require 'graffiti/rdf_property_map'
16
+
17
+ module Graffiti
18
+
19
+ # Configuration of relational RDF storage (see examples)
20
+ #
21
+ class RdfConfig
22
+ def initialize(config)
23
+ @ns = config['ns']
24
+
25
+ @map = {}
26
+
27
+ config['map'].each_pair do |p, m|
28
+ table, field = m.to_a.first
29
+ p = ns_expand(p)
30
+ @map[p] = RdfPropertyMap.new(p, table, field)
31
+ end
32
+
33
+ if config['subproperties'].kind_of? Hash
34
+ config['subproperties'].each_pair do |p, subproperties|
35
+ p = ns_expand(p)
36
+ map = @map[p] or raise RuntimeError,
37
+ "Incorrect RDF storage configuration: superproperty #{p} must be mapped"
38
+ map.superproperty = true
39
+
40
+ qualifier = RdfPropertyMap.qualifier_property(p)
41
+ @map[qualifier] = RdfPropertyMap.new(
42
+ qualifier, map.table, RdfPropertyMap.qualifier_field(map.field))
43
+
44
+ subproperties.each do |subp|
45
+ subp = ns_expand(subp)
46
+ @map[subp] = RdfPropertyMap.new(subp, map.table, map.field)
47
+ @map[subp].subproperty_of = p
48
+ end
49
+ end
50
+ end
51
+
52
+ if config['transitive_closure'].kind_of? Hash
53
+ config['transitive_closure'].each_pair do |p, table|
54
+ @map[ ns_expand(p) ].transitive_closure = table
55
+
56
+ if config['subproperties'].kind_of?(Hash) and config['subproperties'][p]
57
+ config['subproperties'][p].each do |subp|
58
+ @map[ ns_expand(subp) ].transitive_closure = table
59
+ end
60
+ end
61
+ end
62
+ end
63
+ end
64
+
65
+ # hash of namespaces
66
+ attr_reader :ns
67
+
68
+ # map internal property names with expanded namespaces to RdfPropertyMap
69
+ # objects
70
+ #
71
+ attr_reader :map
72
+
73
+ def ns_expand(p)
74
+ p and p.sub(/\A(\S+?)::/) { @ns[$1] }
75
+ end
76
+ end
77
+
78
+ end
@@ -0,0 +1,92 @@
1
+ # Graffiti RDF Store
2
+ # (originally written for Samizdat project)
3
+ #
4
+ # Copyright (c) 2002-2011 Dmitry Borodaenko <angdraug@debian.org>
5
+ #
6
+ # This program is free software.
7
+ # You can distribute/modify this program under the terms of
8
+ # the GNU General Public License version 3 or later.
9
+ #
10
+ # see doc/rdf-storage.txt for introduction and Graffiti Squish definition;
11
+ # see doc/storage-impl.txt for explanation of implemented algorithms
12
+ #
13
+ # vim: et sw=2 sts=2 ts=8 tw=0
14
+
15
+ module Graffiti
16
+
17
+ # Map of an internal RDF property into relational storage
18
+ #
19
+ class RdfPropertyMap
20
+
21
+ # special qualifier map
22
+ #
23
+ # ' ' is added to the property name to make sure it can't clash with any
24
+ # valid property uriref
25
+ #
26
+ def RdfPropertyMap.qualifier_property(property, type = 'subproperty')
27
+ property + ' ' + type
28
+ end
29
+
30
+ # special qualifier field
31
+ #
32
+ def RdfPropertyMap.qualifier_field(field, type = 'subproperty')
33
+ field + '_' + type
34
+ end
35
+
36
+ def initialize(property, table, field)
37
+ # fixme: support ambiguous mappings
38
+ @property = property
39
+ @table = table
40
+ @field = field
41
+ end
42
+
43
+ # expanded uriref of the mapped property
44
+ #
45
+ attr_reader :property
46
+
47
+ # name of the table into which the property is mapped (property domain is an
48
+ # internal resource class mapped into this table)
49
+ #
50
+ attr_reader :table
51
+
52
+ # name of the field into which the property is mapped
53
+ #
54
+ # if property range is not a literal, the field is a reference to the
55
+ # resource table
56
+ #
57
+ attr_reader :field
58
+
59
+ # expanded uriref of the property which this property is a subproperty of
60
+ #
61
+ # if set, this property maps into the same table and field as its
62
+ # superproperty, and is qualified by an additional field named
63
+ # <field>_subproperty which refers to a uriref resource holding uriref of
64
+ # this subproperty
65
+ #
66
+ attr_accessor :subproperty_of
67
+
68
+ attr_writer :superproperty
69
+
70
+ # set to +true+ if this property has subproperties
71
+ #
72
+ def superproperty?
73
+ @superproperty or false
74
+ end
75
+
76
+ # name of transitive closure table for a transitive property
77
+ #
78
+ # the format of a transitive closure table is:
79
+ #
80
+ # - 'resource' field refers to the subject resource id
81
+ # - '<field>' property field and '<field>_subproperty' qualifier field (in
82
+ # case of subproperty) have the same name as in the main table
83
+ # - 'distance' field holds the distance from subject to object in the RDF
84
+ # graph
85
+ #
86
+ # the transitive closure table is automatically updated by a trigger on every
87
+ # update of the main table
88
+ #
89
+ attr_accessor :transitive_closure
90
+ end
91
+
92
+ end
@@ -0,0 +1,916 @@
1
+ # Graffiti RDF Store
2
+ # (originally written for Samizdat project)
3
+ #
4
+ # Copyright (c) 2002-2011 Dmitry Borodaenko <angdraug@debian.org>
5
+ #
6
+ # This program is free software.
7
+ # You can distribute/modify this program under the terms of
8
+ # the GNU General Public License version 3 or later.
9
+ #
10
+ # see doc/rdf-storage.txt for introduction and Graffiti Squish definition;
11
+ # see doc/storage-impl.txt for explanation of implemented algorithms
12
+ #
13
+ # vim: et sw=2 sts=2 ts=8 tw=0
14
+
15
+ require 'delegate'
16
+ require 'uri/common'
17
+ require 'graffiti/rdf_property_map'
18
+ require 'graffiti/squish'
19
+
20
+ module Graffiti
21
+
22
+ class SqlNodeBinding
23
+ def initialize(table_alias, field)
24
+ @alias = table_alias
25
+ @field = field
26
+ end
27
+
28
+ attr_reader :alias, :field
29
+
30
+ def to_s
31
+ @alias + '.' + @field
32
+ end
33
+
34
+ alias :inspect :to_s
35
+
36
+ def eql?(binding)
37
+ @alias == binding.alias and @field == binding.field
38
+ end
39
+
40
+ alias :'==' :eql?
41
+
42
+ def hash
43
+ self.to_s.hash
44
+ end
45
+ end
46
+
47
+
48
+ class SqlExpression < DelegateClass(Array)
49
+ def initialize(*parts)
50
+ super parts
51
+ end
52
+
53
+ def to_s
54
+ '(' << self.join(' ') << ')'
55
+ end
56
+
57
+ alias :to_str :to_s
58
+
59
+ def traverse(&block)
60
+ self.each do |part|
61
+ case part
62
+ when SqlExpression
63
+ part.traverse(&block)
64
+ else
65
+ yield
66
+ end
67
+ end
68
+ end
69
+
70
+ def rebind!(rebind, &block)
71
+ self.each_with_index do |part, i|
72
+ case part
73
+ when SqlExpression
74
+ part.rebind!(rebind, &block)
75
+ when SqlNodeBinding
76
+ if rebind[part]
77
+ self[i] = rebind[part]
78
+ yield part if block_given?
79
+ end
80
+ end
81
+ end
82
+ end
83
+
84
+ alias :eql? :'=='
85
+
86
+ def hash
87
+ self.to_s.hash
88
+ end
89
+ end
90
+
91
+
92
+ # Transform RDF query pattern graph into a relational join expression.
93
+ #
94
+ class SqlMapper
95
+ include Debug
96
+
97
+ def initialize(config, pattern, negative = [], optional = [], global_filter = '')
98
+ @config = config
99
+ @global_filter = global_filter
100
+
101
+ check_graph(pattern)
102
+ negative.empty? or check_graph(pattern + negative)
103
+ optional.empty? or check_graph(pattern + optional)
104
+
105
+ map_predicates(pattern, negative, optional)
106
+ transform
107
+ generate_tables_and_conditions
108
+
109
+ @jc = @aliases = @ac = @global_filter = nil
110
+ end
111
+
112
+ # map clause position to table, field, and table alias
113
+ #
114
+ # position => {
115
+ # :subject => {
116
+ # :node => node,
117
+ # :field => field
118
+ # },
119
+ # :object => {
120
+ # :node => node,
121
+ # :field => field
122
+ # },
123
+ # :map => RdfPropertyMap,
124
+ # :bind_mode => < :must_bind | :may_bind | :must_not_bind >,
125
+ # :alias => alias
126
+ # }
127
+ #
128
+ attr_reader :clauses
129
+
130
+ # map node to list of positions in clauses
131
+ #
132
+ # node => {
133
+ # :positions => [
134
+ # { :clause => position, :role => < :subject | :object > }
135
+ # ],
136
+ # :bind_mode => < :must_bind | :may_bind | :must_not_bind >,
137
+ # :colors => { color1 => bind_mode1, ... },
138
+ # :ground => < true | false >
139
+ # }
140
+ #
141
+ attr_reader :nodes
142
+
143
+ # list of tables for FROM clause of SQL query
144
+ attr_reader :from
145
+
146
+ # conditions for WHERE clause of SQL query
147
+ attr_reader :where
148
+
149
+ # return node's binding, raise exception if the node isn't bound
150
+ #
151
+ def bind(node)
152
+ (@nodes[node] and @bindings[node] and (binding = @bindings[node].first)
153
+ ) or raise ProgrammingError,
154
+ "Node '#{node}' is not bound by the query pattern"
155
+
156
+ @nodes[node][:positions].each do |p|
157
+ if :object == p[:role] and @clauses[ p[:clause] ][:map].subproperty_of
158
+
159
+ property = @clauses[ p[:clause] ][:map].property
160
+ return %{select_subproperty(#{binding}, #{bind(property)})}
161
+ end
162
+ end
163
+
164
+ binding
165
+ end
166
+
167
+ private
168
+
169
+ # Check whether pattern is not a disjoint graph (all nodes are
170
+ # undirectionally reachable from one node).
171
+ #
172
+ def check_graph(pattern)
173
+ nodes = pattern.transpose[1, 2].flatten.uniq # all nodes
174
+
175
+ seen = [ nodes.shift ]
176
+ found_more = true
177
+
178
+ while found_more and not nodes.empty?
179
+ found_more = false
180
+
181
+ pattern.each do |predicate, subject, object|
182
+
183
+ if seen.include?(subject) and nodes.include?(object)
184
+ seen.push(object)
185
+ nodes.delete(object)
186
+ found_more = true
187
+
188
+ elsif seen.include?(object) and nodes.include?(subject)
189
+ seen.push(subject)
190
+ nodes.delete(subject)
191
+ found_more = true
192
+ end
193
+ end
194
+ end
195
+
196
+ nodes.empty? or raise ProgrammingError, "Query pattern is a disjoint graph"
197
+ end
198
+
199
+ # Stage 1: Predicate Mapping (storage-impl.txt).
200
+ #
201
+ def map_predicates(pattern, negative, optional)
202
+ @nodes = {}
203
+ @clauses = []
204
+
205
+ map_pattern(pattern, :must_bind)
206
+ map_pattern(negative, :must_not_bind)
207
+ map_pattern(optional, :may_bind)
208
+
209
+ @color_counter = @must_bind_nodes = nil
210
+
211
+ refine_ambiguous_properties
212
+
213
+ debug do
214
+ @nodes.each do |node, n|
215
+ debug %{#{node}: #{n[:bind_mode]} #{n[:colors].inspect}}
216
+ end
217
+ end
218
+ end
219
+
220
+ # Label every connected component of the pattern with a different color.
221
+ #
222
+ # Pattern clause positions:
223
+ #
224
+ # 0. predicate
225
+ # 1. subject
226
+ # 2. object
227
+ # 3. filter
228
+ #
229
+ # Returns hash of node colors.
230
+ #
231
+ # Implements the {Two-pass Connected Component Labeling algorithm}
232
+ # [http://en.wikipedia.org/wiki/Connected_Component_Labeling#Two-pass]
233
+ # with an added special case to exclude _alien_nodes_ from neighbor lists.
234
+ #
235
+ # The special case ensures that parts of a may-bind or must-not-bind
236
+ # subpattern that are only connected through a must-bind node do not connect.
237
+ #
238
+ def label_pattern_components(pattern, alien_nodes, augment_alien_nodes = false)
239
+ return {} if pattern.empty?
240
+
241
+ color = {}
242
+ color_eq = [] # [ [ smaller, larger ], ... ]
243
+ nodes = pattern.transpose[1, 2].flatten.uniq
244
+ alien_nodes_here = nodes & alien_nodes
245
+
246
+ @color_counter = @color_counter ? @color_counter.next : 0
247
+ color[ nodes[0] ] = @color_counter
248
+
249
+ # first pass
250
+ 1.upto(nodes.size - 1) do |i|
251
+ node = nodes[i]
252
+
253
+ pattern.each do |predicate, subject, object, filter|
254
+ if node == subject
255
+ neighbor = object
256
+ elsif node == object
257
+ neighbor = subject
258
+ end
259
+ next if neighbor.nil? or color[neighbor].nil? or
260
+ alien_nodes_here.include?(neighbor)
261
+
262
+ if color[node].nil?
263
+ color[node] = color[neighbor]
264
+ elsif color[node] != color[neighbor] # record color equivalence
265
+ color_eq |= [ [ color[node], color[neighbor] ].sort ]
266
+ end
267
+ end
268
+
269
+ color[node] ||= (@color_counter += 1)
270
+ end
271
+
272
+ # second pass
273
+ nodes.each do |node|
274
+ while eq = color_eq.rassoc(color[node])
275
+ color[node] = eq[0]
276
+ end
277
+ end
278
+
279
+ alien_nodes.push(*nodes).uniq! if augment_alien_nodes
280
+
281
+ color
282
+ end
283
+
284
+ def map_pattern(pattern, bind_mode = :must_bind)
285
+ pattern = pattern.dup
286
+ @must_bind_nodes ||= []
287
+ color = label_pattern_components(pattern, @must_bind_nodes, :must_bind == bind_mode)
288
+
289
+ pattern.each do |predicate, subject, object, filter, transitive|
290
+
291
+ # validate the triple
292
+ predicate =~ URI::URI_REF or raise ProgrammingError,
293
+ "Valid uriref expected in predicate position instead of '#{predicate}'"
294
+
295
+ [subject, object].each do |node|
296
+ node =~ SquishQuery::INTERNAL or
297
+ node =~ SquishQuery::BN or
298
+ node =~ URI::URI_REF or
299
+ raise ProgrammingError,
300
+ "Resource or blank node name expected instead of '#{node}'"
301
+ end
302
+
303
+ # list of possible mappings into internal tables
304
+ map = @config.map[predicate]
305
+
306
+ if transitive and map.transitive_closure.nil?
307
+ raise ProgrammingError,
308
+ "No transitive closure is defined for #{predicate} property"
309
+ end
310
+
311
+ if map and
312
+ (subject =~ SquishQuery::BN or
313
+ subject =~ SquishQuery::INTERNAL or
314
+ subject =~ SquishQuery::PARAMETER or
315
+ 'resource' == map.table)
316
+ # internal predicate and subject is mappable to resource table
317
+
318
+ i = clauses.size
319
+
320
+ @clauses[i] = {
321
+ :subject => [ { :node => subject, :field => 'id' } ],
322
+ :object => [ { :node => object, :field => map.field } ],
323
+ :map => map,
324
+ :transitive => transitive,
325
+ :bind_mode => bind_mode
326
+ }
327
+ @clauses[i][:filter] = SqlExpression.new(filter) if filter
328
+
329
+ [subject, object].each do |node|
330
+ if @nodes[node]
331
+ @nodes[node][:bind_mode] =
332
+ stronger_bind_mode(@nodes[node][:bind_mode], bind_mode)
333
+ else
334
+ @nodes[node] = { :positions => [], :bind_mode => bind_mode, :colors => {} }
335
+ end
336
+
337
+ # set of node colors, one for each bind_mode
338
+ @nodes[node][:colors][ color[node] ] = bind_mode
339
+ end
340
+
341
+ # reverse mapping of the node occurences
342
+ @nodes[subject][:positions].push( { :clause => i, :role => :subject } )
343
+ @nodes[object][:positions].push( { :clause => i, :role => :object } )
344
+
345
+ if superp = map.subproperty_of
346
+ # link subproperty qualifier into the pattern
347
+ pattern.push(
348
+ [RdfPropertyMap.qualifier_property(superp), subject, predicate])
349
+ color[predicate] = color[object]
350
+
351
+ # no need to ground both subproperty and superproperty
352
+ @nodes[object][:ground] = true
353
+ end
354
+
355
+ else
356
+ # assume reification for unmapped predicates:
357
+ #
358
+ # | (rdf::predicate ?_stmt_#{i} p)
359
+ # (p s o) -> | (rdf::subject ?_stmt_#{i} s)
360
+ # | (rdf::object ?_stmt_#{i} o)
361
+ #
362
+ rdf = 'http://www.w3.org/1999/02/22-rdf-syntax-ns#'
363
+ stmt = "?_stmt_#{i}"
364
+ pattern.push([rdf + 'predicate', stmt, predicate],
365
+ [rdf + 'subject', stmt, subject],
366
+ [rdf + 'object', stmt, object])
367
+ color[stmt] = color[predicate] = color[object]
368
+ end
369
+ end
370
+ end
371
+
372
+ # Select strongest of the two bind modes, in the following order of
373
+ # preference:
374
+ #
375
+ # :must_bind -> :must_not_bind -> :may_bind
376
+ #
377
+ def stronger_bind_mode(mode1, mode2)
378
+ if mode1 != mode2 and (:must_bind == mode2 or :may_bind == mode1)
379
+ mode2
380
+ else
381
+ mode1
382
+ end
383
+ end
384
+
385
+ # If a node can be mapped to more than one [table, field] pair, see if it can
386
+ # be refined based on other occurences of this node in other query clauses.
387
+ #
388
+ def refine_ambiguous_properties
389
+ @nodes.each_value do |n|
390
+ map = n[:positions]
391
+
392
+ map.each_with_index do |p, i|
393
+ big = @clauses[ p[:clause] ][ p[:role] ]
394
+ next if big.size <= 1 # no refining needed
395
+
396
+ debug { n + ': ' + big.inspect }
397
+
398
+ (i + 1).upto(map.size - 1) do |j|
399
+ small_p = map[j]
400
+ small = @clauses[ small_p[:clause] ][ small_p[:role] ]
401
+
402
+ refined = big & small
403
+ if refined.size > 0 and refined.size < big.size
404
+
405
+ # refine the node...
406
+ @clauses[ p[:clause] ][ p[:role] ] = big = refined
407
+
408
+ # ...and its pair
409
+ @clauses[ p[:clause] ][ opposite_role(p[:role]) ].collect! {|pair|
410
+ refined.assoc(pair[0]) ? pair : nil
411
+ }.compact!
412
+ end
413
+ end
414
+ end
415
+ end
416
+
417
+ # drop remaining ambiguous mappings
418
+ # todo: split query for ambiguous mappings
419
+ @clauses.each do |clause|
420
+ next if clause.nil? # means it was reified
421
+ clause[:subject] = clause[:subject].first
422
+ clause[:object] = clause[:object].first
423
+ end
424
+ end
425
+
426
+ def opposite_role(role)
427
+ :subject == role ? :object : :subject
428
+ end
429
+
430
+ # Return current value of alias counter, remember which table it was assigned
431
+ # to, and increment the counter.
432
+ #
433
+ def next_alias(table, node, bind_mode = @nodes[node][:bind_mode])
434
+ @ac ||= 'a'
435
+ @aliases ||= {}
436
+
437
+ a = @ac.dup
438
+ @aliases[a] = {
439
+ :table => table,
440
+ :node => node,
441
+ :bind_mode => bind_mode,
442
+ :filter => []
443
+ }
444
+
445
+ @ac.next!
446
+ return a
447
+ end
448
+
449
+ def define_relation_aliases
450
+ @nodes.each do |node, n|
451
+
452
+ positions = n[:positions]
453
+
454
+ # go through all clauses with this node in subject position
455
+ positions.each_with_index do |p, i|
456
+ next if :subject != p[:role] or @clauses[ p[:clause] ][:alias]
457
+
458
+ clause = @clauses[ p[:clause] ]
459
+ map = clause[:map]
460
+ table = clause[:transitive] ? map.transitive_closure : map.table
461
+
462
+ # see if we've already mapped this node to the same table before
463
+ 0.upto(i - 1) do |j|
464
+ similar_clause = @clauses[ positions[j][:clause] ]
465
+
466
+ if similar_clause[:alias] and
467
+ similar_clause[:map].table == table and
468
+ similar_clause[:map].field != map.field
469
+ # same node, same table, different field -> same alias
470
+
471
+ clause[:alias] = similar_clause[:alias]
472
+ break
473
+ end
474
+ end
475
+
476
+ if clause[:alias].nil?
477
+ clause[:alias] =
478
+ if clause[:transitive]
479
+ # transitive clause bind mode overrides a stronger node bind mode
480
+ #
481
+ # fixme: generic case for multiple aliases per node
482
+ next_alias(table, node, clause[:bind_mode])
483
+ else
484
+ next_alias(table, node)
485
+ end
486
+ end
487
+ end
488
+ end # optimize: unnecessary aliases are generated
489
+ end
490
+
491
+ def update_alias_filters
492
+ @clauses.each do |c|
493
+ if c[:filter]
494
+ @aliases[ c[:alias] ][:filter].push(c[:filter])
495
+ end
496
+ end
497
+ end
498
+
499
+ # Stage 2: Relation Aliases and Join Conditions (storage-impl.txt).
500
+ #
501
+ # Result is map of aliases in @aliases and list of join conditions in @jc.
502
+ #
503
+ def transform
504
+ define_relation_aliases
505
+ update_alias_filters
506
+
507
+ # [ [ binding1, binding2 ], ... ]
508
+ @jc = []
509
+ @bindings = {}
510
+
511
+ @nodes.each do |node, n|
512
+ positions = n[:positions]
513
+
514
+ # node binding
515
+ first = positions.first
516
+ clause = @clauses[ first[:clause] ]
517
+ a = clause[:alias]
518
+ binding = SqlNodeBinding.new(a, clause[ first[:role] ][:field])
519
+ @bindings[node] = [ binding ]
520
+
521
+ # join conditions
522
+ 1.upto(positions.size - 1) do |i|
523
+ p = positions[i]
524
+ clause2 = @clauses[ p[:clause] ]
525
+ binding2 = SqlNodeBinding.new(clause2[:alias], clause2[ p[:role] ][:field])
526
+
527
+ unless @bindings[node].include?(binding2)
528
+ @bindings[node].push(binding2)
529
+ @jc.push([binding, binding2, node])
530
+ n[:ground] = true
531
+ end
532
+ end
533
+
534
+ # ground non-blank nodes
535
+ if node !~ SquishQuery::BN
536
+
537
+ if node =~ SquishQuery::INTERNAL # internal resource id
538
+ @aliases[a][:filter].push SqlExpression.new(binding, '=', $1)
539
+
540
+ elsif node =~ SquishQuery::PARAMETER or node =~ SquishQuery::LITERAL
541
+ @aliases[a][:filter].push SqlExpression.new(binding, '=', node)
542
+
543
+ elsif node =~ URI::URI_REF # external resource uriref
544
+
545
+ r = nil
546
+ positions.each do |p|
547
+ next unless :subject == p[:role]
548
+
549
+ c = @clauses[ p[:clause] ]
550
+ if 'resource' == c[:map].table
551
+ r = c[:alias] # reuse existing mapping to resource table
552
+ break
553
+ end
554
+ end
555
+
556
+ if r.nil?
557
+ r = next_alias('resource', node)
558
+ r_binding = SqlNodeBinding.new(r, 'id')
559
+ @bindings[node].unshift(r_binding)
560
+ @jc.push([ binding, r_binding, node ])
561
+ end
562
+
563
+ @aliases[r][:filter].push SqlExpression.new(
564
+ SqlNodeBinding.new(r, 'uriref'), '=', "'t'", 'AND',
565
+ SqlNodeBinding.new(r, 'label'), '=', %{'#{node}'})
566
+
567
+ else
568
+ raise RuntimeError,
569
+ "Invalid node '#{node}' should never occur at this point"
570
+ end
571
+
572
+ n[:ground] = true
573
+ end
574
+ end
575
+
576
+ debug do
577
+ @aliases.each {|alias_name, a| debug %{#{alias_name}: #{a.inspect}} }
578
+ @jc.each {|jc| debug jc.inspect }
579
+ end
580
+ end
581
+
582
+ # Produce SQL FROM and WHERE clauses from results of transform().
583
+ #
584
+ def generate_tables_and_conditions
585
+ main_path, seen = jc_subgraph_path(:must_bind)
586
+ debug { main_path.inspect }
587
+
588
+ main_path and not main_path.empty? or raise RuntimeError,
589
+ 'Failed to find table aliases for main query'
590
+
591
+ @where = ground_dangling_blank_nodes(main_path)
592
+
593
+ joins = ''
594
+ subquery_count = 'a'
595
+
596
+ [ :must_not_bind, :may_bind ].each do |bind_mode|
597
+ loop do
598
+ sub_path, new = jc_subgraph_path(bind_mode, seen)
599
+ break if sub_path.nil? or sub_path.empty?
600
+
601
+ debug { sub_path.inspect }
602
+
603
+ sub_query, sub_join = sub_path.partition {|a,| main_path.assoc(a).nil? }
604
+ # fixme: make sure that sub_join is not empty
605
+
606
+ if 1 == sub_query.size
607
+ # simplified case: join single table directly without a subquery
608
+ join_alias, = sub_query.first
609
+ a = @aliases[join_alias]
610
+ join_target = a[:table]
611
+ join_conditions = jc_path_to_join_conditions(sub_join) + a[:filter]
612
+
613
+ else
614
+ # left join subquery to the main query
615
+ join_alias = '_subquery_' << subquery_count
616
+ subquery_count.next!
617
+
618
+ sub_join = subquery_jc_path(sub_join, join_alias)
619
+ rebind = rebind_subquery(sub_path, join_alias)
620
+ select_nodes = subquery_select_nodes(rebind, main_path, sub_join)
621
+
622
+ join_conditions = jc_path_to_join_conditions(sub_join, rebind,
623
+ select_nodes)
624
+
625
+ select_nodes = select_nodes.keys.collect {|b|
626
+ b.to_s << ' AS ' << rebind[b].field
627
+ }.join(', ')
628
+
629
+ tables, conditions = jc_path_to_tables_and_conditions(sub_path)
630
+
631
+ join_target = "(\nSELECT #{select_nodes}\nFROM #{tables}"
632
+ join_target << "\nWHERE " << conditions unless conditions.empty?
633
+ join_target << "\n)"
634
+ join_target.gsub!(/\n(?!\)\z)/, "\n ")
635
+ end
636
+
637
+ joins << ("\nLEFT JOIN " + join_target + ' AS ' + join_alias + ' ON ' +
638
+ join_conditions.uniq.join(' AND '))
639
+
640
+ if :must_not_bind == bind_mode
641
+ left_join_is_null(main_path, sub_join)
642
+ end
643
+ end
644
+ end
645
+
646
+ @from, main_where = jc_path_to_tables_and_conditions(main_path)
647
+
648
+ @from << joins
649
+
650
+ @where.push('(' + main_where + ')') unless main_where.empty?
651
+ @where.push('(' + @global_filter + ')') unless @global_filter.empty?
652
+ @where = @where.join("\nAND ")
653
+ end
654
+
655
+ # Produce a subgraph path through join conditions linking all aliases with
656
+ # given _bind_mode_ that form a same-color connected component of the join
657
+ # conditions graph and weren't processed yet:
658
+ #
659
+ # path = [ [start, []], [ next, [ jc, ... ] ], ... ]
660
+ #
661
+ # Update _seen_ hash for all aliases included in the produced path.
662
+ #
663
+ def jc_subgraph_path(bind_mode, seen = {})
664
+ start = find_alias(bind_mode, seen)
665
+ return nil if start.nil?
666
+
667
+ new = {}
668
+ new[start] = true
669
+ path = [ [start, []] ]
670
+ colors = @nodes[ @aliases[start][:node] ][:colors].keys
671
+
672
+ loop do # while we can find more connecting joins of the same color
673
+ join_alias = nil
674
+
675
+ @jc.each do |jc|
676
+ # use cases:
677
+ # - seen is empty (composing the must-bind join)
678
+ # - seen is not empty (composing a subquery)
679
+
680
+ next if (colors & @nodes[ jc[2] ][:colors].keys).empty?
681
+
682
+ 0.upto(1) do |i|
683
+ a_seen = jc[i].alias
684
+ a_next = jc[1-i].alias
685
+
686
+ if not new[a_next] and (
687
+ ((new[a_seen] or seen[a_seen]) and
688
+ (@aliases[a_next][:bind_mode] == bind_mode)
689
+ # connect an untouched node of matching bind mode
690
+ ) or (
691
+ new[a_seen] and seen[a_next] and
692
+ # connect subquery to the rest of the query...
693
+ @aliases[a_seen][:bind_mode] == bind_mode
694
+ # ...but only go one step deep
695
+ ))
696
+
697
+ join_alias = a_next
698
+ break
699
+ end
700
+ end
701
+
702
+ break if join_alias
703
+ end
704
+
705
+ break if join_alias.nil?
706
+
707
+ # join it to all seen aliases
708
+ join_on = @jc.find_all do |jc|
709
+ a1, a2 = jc[0, 2].collect {|b| b.alias }
710
+ (new[a1] and a2 == join_alias) or (new[a2] and a1 == join_alias)
711
+ end
712
+
713
+ new[join_alias] = true
714
+ path.push([join_alias, join_on])
715
+ end
716
+
717
+ seen.merge!(new)
718
+ [ path, new ]
719
+ end
720
+
721
+ def find_alias(bind_mode, seen = {})
722
+ @aliases.each do |alias_name, a|
723
+ next if seen[alias_name] or a[:bind_mode] != bind_mode
724
+ return alias_name
725
+ end
726
+
727
+ nil
728
+ end
729
+
730
+ # Ground all must-bind blank nodes that weren't ground elsewhere to an
731
+ # existential quantifier.
732
+ #
733
+ def ground_dangling_blank_nodes(main_path)
734
+ conditions = []
735
+ ground_nodes = @global_filter.scan(SquishQuery::BN_SCAN)
736
+
737
+ @nodes.each do |node, n|
738
+ next if (n[:ground] or ground_nodes.include?(node))
739
+
740
+ expression =
741
+ case n[:bind_mode]
742
+ when :must_bind
743
+ 'IS NOT NULL'
744
+ when :must_not_bind
745
+ 'IS NULL'
746
+ else
747
+ next
748
+ end
749
+
750
+ @bindings[node].each do |binding|
751
+ if main_path.assoc(binding.alias)
752
+ conditions.push SqlExpression.new(binding, expression)
753
+ break
754
+ end
755
+ end
756
+ end
757
+
758
+ conditions
759
+ end
760
+
761
+ # Join a subquery to the main query: for each alias shared between the two,
762
+ # link 'id' field of the corresponding table within and outside the subquery.
763
+ # If no node is bound to the 'id' field, create a virtual node bound to it,
764
+ # so that it can be rebound by rebind_subquery().
765
+ #
766
+ def subquery_jc_path(sub_join, join_alias)
767
+ sub_join.empty? and raise ProgrammingError,
768
+ "Unexpected empty subquery, check your RDF storage configuration"
769
+ # fixme: reify instead of raising an exception
770
+
771
+ sub_join.transpose[0].uniq.collect do |a|
772
+ binding = SqlNodeBinding.new(a, 'id')
773
+
774
+ exists = false
775
+ @nodes.each do |node, n|
776
+ if @bindings[node].include?(binding)
777
+ exists = true
778
+ break
779
+ end
780
+ end
781
+
782
+ unless exists
783
+ node = '?' + join_alias + '_' + a
784
+ @nodes[node] = { :ground => true }
785
+ @bindings[node] = [ binding ]
786
+ end
787
+
788
+ [ a, [[ binding, binding ]] ]
789
+ end
790
+ end
791
+
792
+ # Generate a hash that maps all bindings that's been wrapped inside the
793
+ # _sub_query_ (a jc path, see jc_subquery_path()) to rebound bindings based
794
+ # on the _join_alias_ so that they may still be used in the main query.
795
+ #
796
+ def rebind_subquery(sub_path, join_alias)
797
+ rebind = {}
798
+ field_count = 'a'
799
+
800
+ wrapped = {}
801
+ sub_path.each {|a,| wrapped[a] = true }
802
+
803
+ @nodes.each do |node, n|
804
+ @bindings[node].each do |b|
805
+ if wrapped[b.alias] and rebind[b].nil?
806
+ field = '_field_' << field_count
807
+ field_count.next!
808
+ rebind[b] = SqlNodeBinding.new(join_alias, field)
809
+ end
810
+ end
811
+ end
812
+
813
+ rebind
814
+ end
815
+
816
+ # Go through global filter, filters in the main query, and join conditions
817
+ # attaching the subquery to the main query, rebind the bindings for nodes
818
+ # wrapped inside the subquery, and return a hash with keys for all bindings
819
+ # that should be selected from the subquery.
820
+ #
821
+ def subquery_select_nodes(rebind, main_path, sub_join)
822
+ select_nodes = {}
823
+
824
+ # update the global filter
825
+ @nodes.each do |node, n|
826
+ if r = rebind[ @bindings[node].first ]
827
+ @global_filter.gsub!(/#{Regexp.escape(node)}\b/) do
828
+ select_nodes[ @bindings[node].first ] = true
829
+ r.to_s
830
+ end
831
+ end
832
+ end
833
+
834
+ # update filters in the main query
835
+ main_path.each do |a,|
836
+ next if sub_join.assoc(a)
837
+
838
+ @aliases[a][:filter].each do |f|
839
+ f.rebind!(rebind) do |b|
840
+ select_nodes[b] = true
841
+ end
842
+ end
843
+ end
844
+
845
+ # update the subquery join path
846
+ sub_join.each do |a, jcs|
847
+ jcs.each do |jc|
848
+ select_nodes[ jc[0] ] = true
849
+ jc[1] = rebind[ jc[1] ]
850
+ end
851
+ end
852
+
853
+ # fixme: update main SELECT list
854
+ select_nodes
855
+ end
856
+
857
+ # Transform jc path (see jc_subgraph_path()) into a list of join conditions.
858
+ # If _rebind_ and _select_nodes_ hashes are defined, conditions will be
859
+ # rebound accordingly, and _select_nodes_ will be updated to include bindings
860
+ # used in the conditions.
861
+ #
862
+ def jc_path_to_join_conditions(jc_path, rebind = nil, select_nodes = nil)
863
+ conditions = []
864
+
865
+ jc_path.each do |a, jcs|
866
+ jcs.each do |b1, b2, n|
867
+ conditions.push SqlExpression.new(b1, '=', b2)
868
+ end
869
+ end
870
+
871
+ conditions.empty? and raise RuntimeError,
872
+ "Failed to join subquery to the main query"
873
+
874
+ conditions
875
+ end
876
+
877
+ # Generate FROM and WHERE clauses from a jc path (see jc_subgraph_path()).
878
+ #
879
+ def jc_path_to_tables_and_conditions(path)
880
+ first, = path[0]
881
+ a = @aliases[first]
882
+
883
+ tables = a[:table] + ' AS ' + first
884
+ conditions = a[:filter]
885
+
886
+ path[1, path.size - 1].each do |join_alias, join_on|
887
+ a = @aliases[join_alias]
888
+
889
+ tables <<
890
+ %{\nINNER JOIN #{a[:table]} AS #{join_alias} ON } <<
891
+ (
892
+ join_on.collect {|b1, b2| SqlExpression.new(b1, '=', b2) } +
893
+ a[:filter]
894
+ ).uniq.join(' AND ')
895
+ end
896
+
897
+ [ tables, conditions.uniq.join("\nAND ") ]
898
+ end
899
+
900
+ # Find and declare as NULL key fields of a must-not-bind subquery.
901
+ #
902
+ def left_join_is_null(main_path, sub_join)
903
+ sub_join.each do |a, jcs|
904
+ jcs.each do |jc|
905
+ 0.upto(1) do |i|
906
+ if main_path.assoc(jc[i].alias).nil?
907
+ @where.push SqlExpression.new(jc[i], 'IS NULL')
908
+ break
909
+ end
910
+ end
911
+ end
912
+ end
913
+ end
914
+ end
915
+
916
+ end