sql-ferret 0.4.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (8) hide show
  1. checksums.yaml +7 -0
  2. data/GPL-3 +674 -0
  3. data/History.txt +3 -0
  4. data/Manifest.txt +6 -0
  5. data/README +294 -0
  6. data/lib/sql-ferret.rb +1719 -0
  7. data/sql-ferret.gemspec +18 -0
  8. metadata +66 -0
@@ -0,0 +1,3 @@
1
+ === 0.4.0 / 2015-02-03
2
+
3
+ * First public release
@@ -0,0 +1,6 @@
1
+ GPL-3
2
+ History.txt
3
+ Manifest.txt
4
+ README
5
+ sql-ferret.gemspec
6
+ lib/sql-ferret.rb
data/README ADDED
@@ -0,0 +1,294 @@
1
+ The SQL Ferret wraps SQLite into a navigational database style
2
+ interface.
3
+
4
+ BEWARE: this is raw and ugly EXPERIMENTAL code, and its API is
5
+ VERY LIKELY to change before 1.0.
6
+
7
+
8
+ == Overview
9
+
10
+ The [[Ferret::new]] constructor takes two arguments: a
11
+ (multiline) string containing the Ferret Data Schema Description
12
+ and a previously opened [[SQLite3::Database]] instance for
13
+ accessing an SQLite database with such a schema. The resulting
14
+ [[Ferret]] instance's primary useful method is [[go]]; it takes
15
+ one mandatory argument -- the Ferret query string -- and may
16
+ also take numbered and named arguments, as well as a block.
17
+
18
+ Note that Ferret has TWO distinct DSLs: one for defining the
19
+ data model, one for querying and manipulating it. When the
20
+ Ferret schema is stored in a text file, it's customarily given a
21
+ name in the form of [[foo.fers]]. Ferret query expressions are
22
+ typically inlined in Ruby code.
23
+
24
+
25
+ == Ferret query language
26
+
27
+ The simplest form of a Ferret query expression is a query over
28
+ one table, filtering by one input column, and producing one
29
+ output column:
30
+
31
+ <table> ':' <input-field> '->' <output-field>
32
+
33
+ Such an expression corresponds to the SQL of
34
+
35
+ SELECT <output-field> FROM <table> WHERE <input-field> = ?
36
+
37
+ Note that in order to process such an expression, [[Ferret#go]]
38
+ requires an extra argument besides the expression -- the
39
+ exemplar value for [[<input-field>]]. The separation of
40
+ expression from data is a deliberate design feature of Ferret:
41
+ on one hand, it's believed to make the expressions clearer; on
42
+ another, it provides a measure of protection against inadvertent
43
+ XSS vulnerabilities.
44
+
45
+ The <input-field> can be omitted. In such a case, the query
46
+ fetches all rows from <table>. If more than one input field is
47
+ supplied, they must be separated by commas and surrounded by
48
+ parentheses, like this. (Rationale: Ferret's [[->]] operator,
49
+ normally binds very weakly on its left hand side. While it
50
+ would not be hard to write an exception into the parser, it is
51
+ believed that permitting the surrounding parentheses to be
52
+ omitted is likely to lead to confusing Ferret query
53
+ expressions.)
54
+
55
+ Multiple output fields can be specified by separating them with
56
+ commas. Surrounding parentheses are not necessary or permitted
57
+ around the right-hand side of an arrow.
58
+
59
+ More complex queries involve multiple tables and what relational
60
+ algebra calls /joins/. Since Ferret aims to provide a
61
+ navigational rather than purely relational interface, it
62
+ presents joins as /dereferencing/, denoted by a trailing [[->]]
63
+ operator. That is, the query
64
+
65
+ houses: number -> resident -> name, phone
66
+
67
+ can correspond to the SQL of
68
+
69
+ SELECT house.number, house.resident, resident.name,
70
+ resident.age
71
+ FROM houses LEFT JOIN residents ON
72
+ houses.resident = residents.id
73
+ WHERE house.number = ?
74
+
75
+ provided that the data schema specifies that the column of
76
+ [[houses.resident]] refers to [[resident]] through its [[id]].
77
+ (In SQL parlance, it needs to be defined as a foreign key.) If,
78
+ instead of [[left join]], an [[inner join]] is desired, the
79
+ two-ended dereferencing arrow [[<->]] needs to be used instead
80
+ of [[->]].
81
+
82
+ A Ferret data schema permitting such translation might look
83
+ roughly thus:
84
+
85
+ [houses]
86
+ id: primary key, integer
87
+ number: optional integer
88
+ name: optional varchar
89
+ street: varchar
90
+ resident: optional ref residents(id)
91
+
92
+ [residents]
93
+ id: primary key, integer
94
+ name: varchar = 'John Smith'
95
+ phone: optional varchar
96
+
97
+ Note that the columns are by default _not_ nullable but they can
98
+ be explicitly defined as nullable by the keyword [[optional]].
99
+ The [[=]] character followed by an SQL expression specifies the
100
+ default value for a column.
101
+
102
+ Also note that in the definition of [[resident: optional ref
103
+ residents(id)]], the [[(id)]] can be omitted because it's clear
104
+ from context -- [[residents.id]] is the primary key of
105
+ [[residents]].
106
+
107
+ This data schema permits only up to one resident per house.
108
+ What if the house->resident relation needs to have an 1->n shape
109
+ rather than 1->0..1? We could move the linking column from
110
+ [[houses]] to [[residents]], like this:
111
+
112
+ [houses]
113
+ id: primary key, integer
114
+ number: optional integer
115
+ name: optional varchar
116
+ street: varchar
117
+ resident: ghost ref residents(house)
118
+
119
+ [residents]
120
+ id: primary key, integer
121
+ name: varchar = 'John Smith'
122
+ phone: optional varchar
123
+ house: optional ref residents
124
+
125
+ Note that we're still defining [[houses.resident]] but it's no
126
+ longer a /column/ -- that is, it does not have a matching SQL
127
+ table column anymore --, but a /ghost field/.
128
+
129
+ Besides being primary keys, columns can be defined merely
130
+ [[unique]]. Ferret does not currently support composite
131
+ secondary keys, but a future version might.
132
+
133
+ How does [[Ferret#go]] deliver its results? It depends. If a
134
+ block is given to it, it will call this block with each row;
135
+ otherwise, it collects rows and returns them. (Actually, if
136
+ Ferret can prove, using [[unique]] and [[primary key]]
137
+ constraints, that the query necessarily produces 0 or 1 rows, it
138
+ will return either [[nil]] or the one row; otherwise, it will
139
+ return an array of the rows.) If the query specifies one column
140
+ (which may be precededed by dereferences); each 'row' will be
141
+ the value without encapsulation; otherwise, Ferret wraps rows
142
+ into [[OpenStruct]] instances. The multicolumn behaviour can be
143
+ forced by adding an explicit trailing comma after what would
144
+ otherwise be the single requested column. (Rationale: while
145
+ these rules are a bit clumsy to specify, they have proven
146
+ intuitive, in a Perlish way.)
147
+
148
+ Each queried column can be given an explicit name, analogously
149
+ to SQL's [[AS]] clause, by specifying
150
+ it between apostrophes after the column appears in the
151
+ expression. Note that this is not a string literal; rather,
152
+ Ferret parses each apostrophe as a token, and the explicit name
153
+ must parse as a valid Ferret identifier token.
154
+
155
+ Star topology joins can be specified by surrounding a joining
156
+ arrow together with its right-hand side in parentheses, like
157
+ this:
158
+
159
+ houses: number -> resident (-> name, phone), street
160
+
161
+ Such parentheses can be nested.
162
+
163
+ In addition to retrieval, Ferret query expressions also support
164
+ modification and deletion of entries. This is notated by
165
+ terminating an expression in a 'blank' dereference operator
166
+ followed by a colon and a verb, like this:
167
+
168
+ houses: number -> resident ->: set
169
+
170
+ The fields to be changed will then be specified as named
171
+ arguments to [[Ferret#go]]. The verb [[update]] can also be
172
+ used instead of [[set]]; it has exactly the same meaning. When
173
+ the verb [[delete]] is used, [[Ferret#go]] does not take any
174
+ named arguments.
175
+
176
+ Outside the Ferret query expression mechanism, there's the
177
+ [[Ferret#insert]] method that takes the target table's name as a
178
+ mandatory argument and the values to be inserted as named
179
+ arguments, like this:
180
+
181
+ $ferret.insert 'residents',
182
+ house: 8,
183
+ name: 'Jacob Doe',
184
+ phone: '555-1212'
185
+
186
+ A future version of Ferret API is likely to provide record
187
+ insertion through [[Ferret#go]]. The reason we're not doing it
188
+ in this public release is that our autovivification mechanisms
189
+ are nowhere near settling yet.
190
+
191
+ Also of note is [[Ferret#change]], whose signature matches
192
+ [[Ferret#insert]] except that it performs the [[INSERT OR
193
+ REPLACE INTO ...]] operation instead of plain [[INSERT]], and
194
+ [[Ferret#transaction]], which supports recursive locking.
195
+ (This is quirky. It's mainly intended for use in library
196
+ functions that need to group Ferret or SQL operations for
197
+ atomicity without assuming that an outer transaction exists or
198
+ does not exist, and it needs care even then. Unless you know
199
+ you need it, you're probably better off using
200
+ [[SQLite3::Database#transaction]] directly.)
201
+
202
+ Instead of a single input value, it's permitted to
203
+ pass a whole collection of input values to [[Ferret#go]] --
204
+ then, Ferret uses [[foo in (?, ...)]] instead of [[foo = ?]],
205
+ and won't consider this column's possible declared uniquity when
206
+ deciding whether the query is a single-row query --, or [[nil]],
207
+ in which case Ferret uses [[foo is null]] for proper SQL-style
208
+ nullity checking. (A 'collection' is defined through duck
209
+ typing -- anything that produces more than one value when the
210
+ [[*]] prefix operator is applied to it.)
211
+
212
+ When it's desired that [[Ferret#go]] produce distinct values, a
213
+ trailing [[: distinct]] or [[: select distinct]] can be used.
214
+ (These two are synonymous.) Note that for query-type verbs, the
215
+ colon *must not* be preceded by a blank-RHS dereferencing arrow,
216
+ unlike for mutation-type verbs, which require it.
217
+
218
+ Besides straight values, Ferret supports interpreted values.
219
+ The set of such is currently hardcoded and is:
220
+
221
+ iso8601
222
+ unix_time
223
+ subsecond_unix_time
224
+ json
225
+ pretty_json
226
+ yaml
227
+ ruby_marshal
228
+ packed_hex
229
+
230
+ When a Ferret schema assigns an interpreted rather than straight
231
+ data type to a column, [[Ferret#go]] will automatically
232
+ interpret and 'deterpret' values for this column, unless the
233
+ column's name is prefixed with a backslash in the expression.
234
+ Note that [[Ferret#insert]] does not (currently?) support
235
+ interpretation, and always processes raw values.
236
+
237
+
238
+ == Likely future development
239
+
240
+ - 'en passant' filters in addition to 'initial' filters. These
241
+ will probably be notated by brackets, and may permit
242
+ ordering comparison in addition to equality checks;
243
+ - use of [[Range]] values in addition to collections as filters
244
+ passed to [[Ferret#go]];
245
+ - explicit ordering of the produced rows, probably via unary
246
+ postfix operators;
247
+ - SQL grouping and aggregate functions;
248
+ - [[Ferret#go]] returning rows as a [[Hash]] instead of an
249
+ [[Array]], using a given key or keys;
250
+ - Kleene dereferencing arrows [[-*>]] and [[-+>]] (and their
251
+ double-ended [[INNER JOIN]] counterparts), implemented via
252
+ SQLite's recently introduced [[WITH RECURSIVE]] construct;
253
+ - multi-column uniquity constraints and foreign keys;
254
+ - automated handling of SIKR (Strictly Incremental Knowledge
255
+ Representation) packets so that Fossil-style multinode
256
+ tracking could be implemented for (nearly) arbitrary data
257
+ structures;
258
+ - accessing SQL's views;
259
+ - customisable autovivification in multistage insertions;
260
+ - defining [[Hash]]-like interfaces atop [[Ferret]] that would
261
+ be backed with a custom, potentially joined, relation in the
262
+ underlying SQL table;
263
+ - defining per-column access-controlled [[Ferret]]-like subAPIs;
264
+ - integration of [[insert]] and [[change]] into [[Ferret#go]];
265
+ - an interface for Android's built-in SQLite API as available
266
+ through Ruboto, as an alternative to the [[sqlite3]] Rubygem
267
+ which is not available on Ruboto;
268
+ - better documentation;
269
+ - renaming to avoid clashing with
270
+ <https://github.com/jkraemer/ferret>.
271
+
272
+
273
+ == Possible future development
274
+
275
+ - tracking the underlying SQLite database's schema via [[PRAGMA
276
+ user_version]] and automatically upgrading it;
277
+ - command line tools for database setup and data import and
278
+ export;
279
+ - transparently joining another table so as to implement
280
+ [[is-a]] type relation atop a SQL data model;
281
+ - transparently interpreting JSON or YAML data as extra payload
282
+ fields of their containing table without explicit formal
283
+ specification, akin to MongoDB;
284
+ - transparent compression of blob/YAML/JSON fields;
285
+ - extracting column type data, constraints, and foreign keys
286
+ from the [[sqlite_master]] data so that the Ferret schema
287
+ would only need to specify ghost fields and interpretations;
288
+ - a notation for /ad hoc/ joins;
289
+ - a third, hybrid DSL: Ferret dereference operator extensions to
290
+ basic SQL queries, something like [[SELECT number, resident ->
291
+ name, resident -> phone FROM houses WHERE id = ?]] or perhaps
292
+ [[SELECT number, resident -> (name, phone) FROM ...]];
293
+ - custom enums as interpretations;
294
+ - global primary key definition in the schema.
@@ -0,0 +1,1719 @@
1
+ require 'json'
2
+ require 'ostruct'
3
+ require 'set'
4
+ require 'time'
5
+ require 'ugh'
6
+ require 'yaml'
7
+
8
+ class Ferret
9
+ module Constants
10
+ QSF_MULTICOL = 0x01
11
+ QSF_MULTIROW = 0x02
12
+ end
13
+
14
+ attr_reader :schema
15
+
16
+ # If the caller can guarantee that this [[Ferret]] instance
17
+ # is never accessed from multiple threads at once, it can
18
+ # turn off using the internal mutex by passing [[use_mutex:
19
+ # false]] for a small performance increase.
20
+ def initialize schema_source, sqlite = nil, use_mutex: true
21
+ raise 'type mismatch' unless schema_source.is_a? String
22
+ super()
23
+ @schema = Ferret::Schema.new(schema_source)
24
+ @sqlite = sqlite
25
+ # Guards access to [[@sqlite]] and
26
+ # [[@sqlite_locked]].
27
+ @sync_mutex = use_mutex ? Mutex.new : nil
28
+ # Are we currently in a transaction? (This lets us
29
+ # implement [[Ferret#transaction]] reëntrantly.)
30
+ @sqlite_locked = false
31
+ return
32
+ end
33
+
34
+ def change table_name, **changes
35
+ ugh? attempted: 'ferret-change' do
36
+ table = @schema[table_name] or
37
+ ugh 'unknown-table', table: table_name
38
+ sql = table.sql_to_change changes.keys.map(&:to_s)
39
+ _sync{@sqlite.execute sql, **changes}
40
+ end
41
+ return
42
+ end
43
+
44
+ def insert table_name, **changes
45
+ ugh? attempted: 'ferret-insert' do
46
+ table = @schema[table_name] or
47
+ ugh 'unknown-table', table: table_name
48
+ sql = table.sql_to_insert changes.keys.map(&:to_s)
49
+ _sync do
50
+ @sqlite.execute sql, **changes
51
+ return @sqlite.last_insert_row_id
52
+ end
53
+ end
54
+ end
55
+
56
+ def transaction &thunk
57
+ # Note that [[_sync]] is reentrant, too.
58
+ _sync do
59
+ # If we get to this point, the only 'concurrent' access
60
+ # might come from our very own thread -- that is, a
61
+ # subroutine down the execution stack from present. This
62
+ # means that we can now access [[@sqlite_locked]] as
63
+ # though we were in a single-threading environment, and
64
+ # thus use it as a flag for 'this thread has already
65
+ # acquired the SQLite-level lock so there's no need to
66
+ # engage it again'. (SQLite's transaction mechanism on
67
+ # its own is not reentrant.)
68
+ if @sqlite_locked then
69
+ return yield
70
+ else
71
+ return @sqlite.transaction do
72
+ begin
73
+ @sqlite_locked = true
74
+ return yield
75
+ ensure
76
+ @sqlite_locked = false
77
+ end
78
+ end
79
+ end
80
+ end
81
+ end
82
+
83
+ def _sync &thunk
84
+ if @sync_mutex.nil? or @sync_mutex.owned? then
85
+ return yield
86
+ else
87
+ @sync_mutex.synchronize &thunk
88
+ end
89
+ end
90
+ private :_sync
91
+
92
+ def go raw_expr, *inputs, **changes, &thunk
93
+ expr = Ferret::Expression_Parser.new(raw_expr, @schema).expr
94
+
95
+ ugh? expr: raw_expr do
96
+ ugh? attempted: 'ferret-go' do
97
+ if inputs.length > expr.exemplars.length then
98
+ ugh 'too-many-exemplars-given',
99
+ expected: expr.exemplars.length,
100
+ given: inputs.length
101
+ elsif inputs.length < expr.exemplars.length then
102
+ ugh 'not-enough-exemplars-given',
103
+ expected: expr.exemplars.length,
104
+ given: inputs.length
105
+ end
106
+ end
107
+
108
+ if thunk and ![:select, :select_distinct].
109
+ include? expr.type then
110
+ ugh 'superfluous-thunk-supplied',
111
+ explanation: 'query-not-a-select'
112
+ end
113
+
114
+ case expr.type
115
+ when :select, :select_distinct then
116
+ ugh? attempted: 'ferret-select' do
117
+ ugh 'superfluous-changes' \
118
+ unless changes.empty?
119
+
120
+ ast = expr.select
121
+
122
+ # At least for now, all the parameters behave as
123
+ # simple ANDed filter rules.
124
+ inputs_imply_single_row = false
125
+ coll = Ferret::Parameter_Collector.new
126
+ expr.exemplars.zip(inputs).each_with_index do
127
+ |(exemplar_spec, input), seq_no|
128
+ test, selects_one_p = coll.feed input, exemplar_spec
129
+ inputs_imply_single_row |= selects_one_p
130
+ ast.sql.gsub! /\[test\s+#{seq_no}\]/, test
131
+ end
132
+
133
+ # Let's now compose the framework of executing the
134
+ # query from [[proc]]:s.
135
+
136
+ # [[tuple_preparer]] takes a tuple of raw values
137
+ # fetched from SQL and prepares it into a deliverable
138
+ # object.
139
+ tuple_preparer = if ast.shape & QSF_MULTICOL == 0 then
140
+ # A single column was requested. The deliverable
141
+ # object is the piece of data from this column.
142
+ proc do |row|
143
+ Ferret.interpret ast.outputs.values.first,
144
+ row.first
145
+ end
146
+ else
147
+ # Multiple columns were requested (or one column in
148
+ # multicolumn mode). The deliverable object is an
149
+ # [[OpenStruct]] mapping field names to data from
150
+ # these fields.
151
+ proc do |row|
152
+ output = OpenStruct.new
153
+ raise 'assertion failed' \
154
+ unless row.length == ast.outputs.size
155
+ # Note that we're relying on modern Ruby's
156
+ # [[Hash]]'s retention of key order here.
157
+ ast.outputs.to_a.each_with_index do
158
+ |(name, interpretation), i|
159
+ output[name] =
160
+ Ferret.interpret interpretation, row[i]
161
+ end
162
+ output
163
+ end
164
+ end
165
+
166
+ # [[query_executor]] takes a [[proc]], executes the
167
+ # query, and calls [[proc]] with each tuple prepared
168
+ # by [[tuple_preparer]].
169
+ query_executor = proc do |&result_handler|
170
+ @sqlite.execute ast.sql, **coll do |row|
171
+ result_handler.call tuple_preparer.call(row)
172
+ end
173
+ end
174
+
175
+ # [[processor]] executes the query and delivers
176
+ # results either by yielding to [[thunk]] if it has
177
+ # been given or by returning them if not, taking into
178
+ # account the query's shape.
179
+ if thunk then
180
+ # A thunk was supplied -- we'll just pass prepared
181
+ # rows to it.
182
+ processor = proc do
183
+ query_executor.call &thunk
184
+ end
185
+ else
186
+ # Why [[and]] here? Well, the shape flag tells us
187
+ # whether the query can translate one input to more
188
+ # than one, and [[inputs_imply_single_row]] tells us
189
+ # whether there are more than one input values that
190
+ # thus get translated. We can only know that the
191
+ # result is a single-row table if both of these
192
+ # preconditions are satisfied.
193
+ if (ast.shape & QSF_MULTIROW == 0) and
194
+ inputs_imply_single_row then
195
+ # A single row was requested (implicitly, by using
196
+ # a unique field as an exemplar). We'll return
197
+ # this row, or [[nil]] if nothing was found.
198
+ processor = lambda do
199
+ query_executor.call do |output|
200
+ return output
201
+ end
202
+ return nil
203
+ end
204
+ else
205
+ # Many rows were requested. We'll collect them to
206
+ # a list and return it.
207
+ processor = proc do
208
+ results = []
209
+ query_executor.call do |output|
210
+ results.push output
211
+ end
212
+ return results
213
+ end
214
+ end
215
+ end
216
+
217
+ _sync &processor
218
+ end
219
+
220
+ when :update then
221
+ ugh? attempted: 'ferret-update' do
222
+ ugh 'missing-changes' \
223
+ if changes.empty?
224
+
225
+ changed_table = expr.stages.last.table
226
+ sql = "update #{changed_table.name} set "
227
+ changes.keys.each_with_index do |fn, i|
228
+ field = changed_table[fn.to_s]
229
+ ugh 'unknown-field', field: fn,
230
+ table: changed_table.name,
231
+ role: 'changed-field' \
232
+ unless field
233
+ sql << ", " unless i.zero?
234
+ sql << "#{field.name} = :#{fn}"
235
+ end
236
+
237
+ if expr.stages.length > 1 then
238
+ ast = expr.select
239
+ sql << " where " <<
240
+ expr.stages.last.stalk.ref.name <<
241
+ " in (#{ast.sql})"
242
+ else
243
+ # Special case: the criteria and the update live in
244
+ # a single table, so we won't need to do any joining
245
+ # or subquerying.
246
+ unless expr.exemplars.empty? then
247
+ sql << " " << expr.where_clause
248
+ end
249
+ end
250
+
251
+ # We're going to pass the changes to
252
+ # [[SQLite::Database#execute]] in a [[Hash]].
253
+ # Unfortunately, the Ruby interface of SQLite does not
254
+ # support mixing numbered and named arguments. As a
255
+ # workaround, we'll pass the etalon as a named
256
+ # argument whose name is a number. This is also
257
+ # convenient because it avoids clashes with any other
258
+ # named parameters -- those are necessarily column
259
+ # names, and column names can not be numbers.
260
+ coll = Ferret::Parameter_Collector.new
261
+ expr.exemplars.zip(inputs).each_with_index do
262
+ |(exemplar_spec, input), seq_no|
263
+ test, selects_one_p = coll.feed input, exemplar_spec
264
+ sql.gsub! /\[test\s+#{seq_no}\]/, test
265
+ end
266
+
267
+ _sync do
268
+ @sqlite.execute sql, **coll, **changes
269
+ return @sqlite.changes
270
+ end
271
+ end
272
+
273
+ when :delete then
274
+ ugh? attempted: 'ferret-delete' do
275
+ ugh 'superfluous-changes' \
276
+ unless changes.empty?
277
+
278
+ affected_table = expr.stages.last.table
279
+ sql = "delete from #{affected_table.name} "
280
+
281
+ if expr.stages.length > 1 then
282
+ ast = expr.select
283
+ sql << " where " <<
284
+ expr.stages.last.stalk.ref.name <<
285
+ " in (#{ast.sql})"
286
+ else
287
+ # Special case: the criteria live in the affected
288
+ # table, so we won't need to do any joining or
289
+ # subquerying.
290
+ unless expr.exemplars.empty? then
291
+ sql << " " << expr.where_clause
292
+ end
293
+ end
294
+
295
+ coll = Ferret::Parameter_Collector.new
296
+ expr.exemplars.zip(inputs).each_with_index do
297
+ |(exemplar_spec, input), seq_no|
298
+ test, selects_one_p = coll.feed input, exemplar_spec
299
+ sql.gsub! /\[test\s+#{seq_no}\]/, test
300
+ end
301
+
302
+ _sync do
303
+ @sqlite.execute sql, **coll
304
+ return @sqlite.changes
305
+ end
306
+ end
307
+
308
+ else
309
+ raise 'assertion failed'
310
+ end
311
+ end
312
+ end
313
+
314
+ include Constants
315
+
316
+ def pragma_user_version
317
+ _sync do
318
+ return @sqlite.get_first_value 'pragma user_version'
319
+ end
320
+ end
321
+
322
+ def pragma_user_version= new_version
323
+ raise 'type mismatch' unless new_version.is_a? Integer
324
+ _sync do
325
+ @sqlite.execute 'pragma user_version = ?', new_version
326
+ end
327
+ return new_version
328
+ end
329
+
330
+ def create_table name
331
+ ugh? attempted: 'ferret-create-table' do
332
+ _sync do
333
+ @sqlite.execute sql_to_create_table(name)
334
+ end
335
+ end
336
+ return
337
+ end
338
+
339
+ def self::interpret interpretation, value
340
+ # If a [[null]] came from the database, we'll interpret it
341
+ # as a [[nil]].
342
+ return nil if value.nil?
343
+ ugh? interpretation: interpretation.to_s,
344
+ input: value.inspect do
345
+ case interpretation
346
+ when nil then
347
+ return value
348
+ when :unix_time, :subsecond_unix_time then
349
+ ugh 'interpreted-value-type-error',
350
+ input: value.inspect,
351
+ expected: 'Numeric' \
352
+ unless value.is_a? Numeric
353
+ return Time.at(value)
354
+ when :iso8601 then
355
+ ugh 'interpreted-value-type-error',
356
+ input: value.inspect,
357
+ expected: 'String' \
358
+ unless value.is_a? String
359
+ return Time.xmlschema(value)
360
+ when :json, :pretty_json then
361
+ ugh 'interpreted-value-type-error',
362
+ input: value.inspect,
363
+ expected: 'String' \
364
+ unless value.is_a? String
365
+ return JSON.parse(value)
366
+ when :yaml then
367
+ ugh 'interpreted-value-type-error',
368
+ input: value.inspect,
369
+ expected: 'String' \
370
+ unless value.is_a? String
371
+ return YAML.load(value)
372
+ when :ruby_marshal then
373
+ ugh 'interpreted-value-type-error',
374
+ input: value.inspect,
375
+ expected: 'String' \
376
+ unless value.is_a? String
377
+ return Marshal.load(value)
378
+ when :packed_hex then
379
+ ugh 'interpreted-value-type-error',
380
+ input: value.inspect,
381
+ expected: 'String' \
382
+ unless value.is_a? String
383
+ ugh 'invalid-hex-data',
384
+ input: value \
385
+ unless value =~ /\A[\dabcdef]*\Z/
386
+ ugh 'odd-length-hex-data',
387
+ input: value \
388
+ unless value.length % 2 == 0
389
+ return [value].pack('H*')
390
+ else
391
+ raise 'assertion failed'
392
+ end
393
+ end
394
+ end
395
+
396
+ def self::deterpret interpretation, object
397
+ # Note that we're not handling [[nil]] any specially. If
398
+ # this field permits [[null]] values, it's the caller's --
399
+ # who lives somewhere in the query execution wrapper of
400
+ # Ferret -- to handle [[nil]], and if it doesn't, passing
401
+ # [[nil]] to [[deterpret]] is either an error or, in case of
402
+ # YAML, requires special escaping.
403
+ case interpretation
404
+ when nil then
405
+ return object
406
+ when :unix_time then
407
+ ugh 'deterpreted-value-type-error',
408
+ input: object.inspect,
409
+ expected: 'Time' \
410
+ unless object.is_a? Time
411
+ return object.to_i
412
+ when :subsecond_unix_time then
413
+ ugh 'deterpreted-value-type-error',
414
+ input: object.inspect,
415
+ expected: 'Time' \
416
+ unless object.is_a? Time
417
+ return object.to_f
418
+ when :iso8601 then
419
+ ugh 'deterpreted-value-type-error',
420
+ input: object.inspect,
421
+ expected: 'Time' \
422
+ unless object.is_a? Time
423
+ return object.xmlschema
424
+ when :json then
425
+ return JSON.generate(object)
426
+ when :pretty_json then
427
+ return JSON.pretty_generate(object)
428
+ when :yaml then
429
+ return YAML.dump(object)
430
+ when :ruby_marshal then
431
+ return Marshal.dump(value)
432
+ when :packed_hex then
433
+ ugh 'deterpreted-value-type-error',
434
+ input: object.inspect,
435
+ expected: 'String' \
436
+ unless object.is_a? String
437
+ return object.unpack('H*').first
438
+ else
439
+ raise 'assertion failed'
440
+ end
441
+ end
442
+ end
443
+
444
+ class Ferret::Alias_Generator
445
+ def initialize used_ids
446
+ super()
447
+ @used_ids = Set.new used_ids
448
+ @counter = 0
449
+ return
450
+ end
451
+
452
+ def create prefix
453
+ begin
454
+ @counter += 1
455
+ candidate = prefix + @counter.to_s
456
+ end while @used_ids.include? candidate
457
+ @used_ids.add candidate
458
+ return candidate
459
+ end
460
+
461
+ def available? id
462
+ return !@used_ids.include?(id)
463
+ end
464
+
465
+ def reserve id
466
+ if @used_ids.include? id then
467
+ ugh 'already-reserved', identifier: id
468
+ end
469
+ @used_ids.add id
470
+ return id
471
+ end
472
+ end
473
+
474
+ class Ferret::Schema
475
+ def initialize schema_source
476
+ raise 'type mismatch' unless schema_source.is_a? String
477
+ super()
478
+ @tables = {} # keyed by forced-lowercase names
479
+ lineno = 0
480
+ curtable = nil
481
+ relocs = [] # a list of [[Proc]]:s
482
+ @used_ids = Set.new
483
+ # so we can avoid clashes when generating aliases;
484
+ # forced downcase
485
+ schema_source.each_line do |line|
486
+ line.strip!
487
+ lineno += 1
488
+ ugh? context: 'parsing-ferret-schema',
489
+ input: line,
490
+ lineno: lineno do
491
+ if line.empty? or line[0] == ?# then
492
+ next
493
+ elsif line =~ /^\[\s*(\w+)\s*\]\s*(#|$)/ then
494
+ name = $1
495
+ dname = name.downcase
496
+ ugh 'duplicate table name', table: name \
497
+ if @tables.has_key? dname
498
+ curtable = @tables[dname] = Ferret::Table.new name
499
+ @used_ids.add dname
500
+ elsif line =~ /^(\w+)\s*:\s*/ then
501
+ name, spec = $1, $'
502
+ # Note that [[add_field]] will check the field's name
503
+ # for uniquity.
504
+ curtable.add_field(
505
+ Ferret::Field.new(curtable, name, spec) do |thunk|
506
+ relocs.push thunk
507
+ end)
508
+ @used_ids.add name.downcase
509
+ else
510
+ ugh 'unparseable-line'
511
+ end
512
+ end
513
+ end
514
+ # Now that we have loaded everything, we can resolve the
515
+ # pointers.
516
+ @tables.each_value do |table|
517
+ ugh 'table-without-columns',
518
+ table: table.name \
519
+ unless table.has_columns?
520
+ end
521
+ relocs.each do |thunk|
522
+ thunk.call self
523
+ end
524
+ return
525
+ end
526
+
527
+ def alias_generator
528
+ return Ferret::Alias_Generator.new(@used_ids)
529
+ end
530
+
531
+ def [] name
532
+ return @tables[name.downcase]
533
+ end
534
+
535
+ def tables
536
+ return @tables.values
537
+ end
538
+
539
+ def sql_to_create_table name
540
+ table = self[name]
541
+ unless table then
542
+ ugh 'unknown-table',
543
+ table: name
544
+ end
545
+ return table.sql_to_create
546
+ end
547
+ end
548
+
549
+ class Ferret::Table
550
+ attr_reader :name
551
+ def initialize name
552
+ raise 'type mismatch' unless name.is_a? String
553
+ super()
554
+ @name = name
555
+ @fields = {} # keyed by forced-lowercase names
556
+ return
557
+ end
558
+
559
+ def [] name
560
+ return @fields[name.downcase]
561
+ end
562
+
563
+ def empty?
564
+ return @fields.empty?
565
+ end
566
+
567
+ def columns
568
+ return @fields.values.select(&:column?)
569
+ end
570
+
571
+ def has_columns?
572
+ return @fields.values.any?(&:column?)
573
+ end
574
+
575
+ # FIXME: move to the section for data model
576
+ attr_reader :primary_key
577
+
578
+ # [[Table#add_field]] is how new [[Field]]:s get added to a
579
+ # [[Table]] as it gets parsed from a Ferret schema. Thus, we
580
+ # check for field name duplication and primary key clashes
581
+ # here. This is also a convenient place to set up
582
+ # [[Table@primary_key]], too, as well as to check against a
583
+ # table having been declared with multiple primary keys.
584
+ def add_field field
585
+ raise 'type mismatch' unless field.is_a? Ferret::Field
586
+ raise 'assertion failed' \
587
+ unless field.table.object_id == self.object_id
588
+ dname = field.name.downcase
589
+ ugh? table: @name do
590
+ ugh 'duplicate-field', field: field.name \
591
+ if @fields.has_key? dname
592
+ if field.primary_key? then
593
+ if @primary_key then
594
+ ugh 'primary-key-clash',
595
+ key1: @primary_key.name,
596
+ key2: field.name
597
+ end
598
+ @primary_key = field
599
+ end
600
+ end
601
+ @fields[dname] = field
602
+ return field
603
+ end
604
+
605
+ def sql_to_change given_column_names
606
+ key_column = sole_unique_column_among given_column_names
607
+
608
+ given_columns = resolve_column_names given_column_names
609
+
610
+ sql = "insert or replace into " + @name +
611
+ "(" + columns.map(&:name).join(', ') + ") "
612
+
613
+ ag = Ferret::Alias_Generator.new [@name, *@fields.keys]
614
+ old_alias, new_alias = %w{old new}.map do |prefix|
615
+ ag.available?(prefix) ?
616
+ ag.reserve(prefix) :
617
+ ag.create(prefix)
618
+ end
619
+
620
+ # Specify which field values are new and which ones are to
621
+ # be retained (or initialised from defaults)
622
+ sql << "select " << columns.map{|column| '%s.%s' % [
623
+ given_columns.include?(column) ? new_alias : old_alias,
624
+ column.name,
625
+ ]}.join(', ')
626
+
627
+ # Encode the changes as a subquery
628
+ sql << " from (select " << given_column_names.map{|fn|
629
+ ":#{fn} as #{fn}"}.join(', ') << ")"
630
+
631
+ # Left-join the subquery against the preëxisting table
632
+ sql << (" as %{new} left join %{table} as %{old} " +
633
+ "on %{new}.%{key} = %{old}.%{key}") % {
634
+ :old => old_alias,
635
+ :new => new_alias,
636
+ :key => key_column.name,
637
+ :table => @name,
638
+ }
639
+
640
+ return sql
641
+ end
642
+
643
+ # Given a list of column names, figure out which of them is
644
+ # the one and only unique (or primary key) field for this
645
+ # table. Ugh if any of them is not a field name; if any field
646
+ # is mentioned multiple times; if multiple [[unique]] fields
647
+ # are mentioned; or if no [[unique]] fields are mentioned.
648
+ def sole_unique_column_among column_names
649
+ ugh? table: @name do
650
+ given_columns = resolve_column_names column_names
651
+ unique_column = nil
652
+ given_columns.each do |column|
653
+ if column.unique? then
654
+ if unique_column then
655
+ ugh 'unique-column-conflict',
656
+ field1: unique_column.name,
657
+ field2: column.name
658
+ end
659
+ unique_column = column
660
+ end
661
+ end
662
+ ugh 'no-unique-column-given',
663
+ fields: given_columns.map(&:name).join(', '),
664
+ known_unique_fields:
665
+ @fields.values.select(&:unique?).
666
+ map(&:name).join(', ') \
667
+ unless unique_column
668
+ return unique_column
669
+ end
670
+ end
671
+
672
+ def sql_to_insert given_column_names
673
+ ugh? table: @name do
674
+ # We have to check this, lest we generate broken SQL.
675
+ ugh 'inserting-null-tuple' \
676
+ if given_column_names.empty?
677
+
678
+ given_columns = resolve_column_names given_column_names
679
+
680
+ # Check that all the mandatory fields are given
681
+ @fields.each_value do |field|
682
+ next if field.optional? or field.default
683
+ next if given_columns.include? field
684
+ # SQLite can autopopulate the [[integer primary key]]
685
+ # field.
686
+ next if field.primary_key? and field.type == 'integer'
687
+ ugh 'mandatory-value-missing',
688
+ table: @name,
689
+ column: field.name,
690
+ given_columns: given_columns.map(&:name).join(' ')
691
+ end
692
+
693
+ return "insert into " +
694
+ "#{@name}(#{given_columns.map(&:name).join ', '}) " +
695
+ "values(:#{given_column_names.join ', :'})"
696
+ end
697
+ end
698
+
699
+ def resolve_column_names names
700
+ results = []
701
+ names.each do |fn|
702
+ raise 'type mismatch' \
703
+ unless fn.is_a? String
704
+ field = @fields[fn.downcase]
705
+ ugh 'unknown-field', field: fn,
706
+ known_fields: @fields.values.map(&:name).
707
+ join(', ') \
708
+ unless field
709
+ ugh 'not-a-column', field: field.name \
710
+ unless field.column?
711
+ ugh 'duplicate-field', field: field.name \
712
+ if results.include? field
713
+ results.push field
714
+ end
715
+ return results
716
+ end
717
+
718
+ def sql_to_create
719
+ # No trailing semicolon.
720
+ return "create table #{name} (\n " +
721
+ @fields.values.select(&:column?).
722
+ map(&:sql_to_declare).join(",\n ") +
723
+ ")"
724
+ end
725
+ end
726
+
727
+ class Ferret::Lexical_Ruleset
728
+ attr_reader :multichar
729
+
730
+ def initialize simple: [],
731
+ intertoken: [],
732
+ multichar: []
733
+
734
+ raise 'duck type mismatch' \
735
+ unless intertoken.respond_to? :include?
736
+ raise 'duck type mismatch' \
737
+ unless simple.respond_to? :include?
738
+ raise 'duck type mismatch' \
739
+ unless multichar.respond_to? :include?
740
+ super()
741
+ @intertoken = intertoken
742
+ @simple = simple
743
+ @multichar = multichar
744
+ return
745
+ end
746
+
747
+ def intertoken? c
748
+ return @intertoken.include? c
749
+ end
750
+
751
+ def simple_particle? c
752
+ return @simple.include? c
753
+ end
754
+
755
+ def id_starter? c
756
+ return [(?A .. ?Z), (?a .. ?z), [?_]].
757
+ any?{|s| s.include? c}
758
+ end
759
+
760
+ def id_continuer? c
761
+ return [(?A .. ?Z), (?a .. ?z), (?0 .. ?9), [?_]].
762
+ any?{|s| s.include? c}
763
+ end
764
+ end
765
+
766
+ Ferret::LEXICAL_RULESET = Ferret::Lexical_Ruleset.new(
767
+ simple: ",:*()'\\<>",
768
+ multichar: %w{-> <-> <= >=},
769
+ intertoken: " \t\n\r\f")
770
+
771
+ class Ferret::Scanner
772
+ def initialize expr
773
+ raise 'type mismatch' unless expr.is_a? String
774
+ super()
775
+ @expr = expr
776
+ @lex = Ferret::LEXICAL_RULESET
777
+
778
+ @offset_ahead = 0
779
+ @token_ahead = nil
780
+ @offset_atail = nil
781
+ @offset_behind = nil
782
+ return
783
+ end
784
+
785
+ def _skip_intertoken_space
786
+ loop do
787
+ break if @offset_ahead >= @expr.length
788
+ break unless @lex.intertoken? @expr[@offset_ahead]
789
+ @offset_ahead += 1
790
+ end
791
+ return
792
+ end
793
+ private :_skip_intertoken_space
794
+
795
+ def peek_token
796
+ return @token_ahead if @token_ahead
797
+
798
+ # Note that [[peek_token]] advances [[@offset_ahead]] to
799
+ # skip over preceding intertoken space but no further.
800
+ # Instead, it'll store the end offset of the peeked token
801
+ # in [[@offset_atail]].
802
+ _skip_intertoken_space
803
+
804
+ # check for eof
805
+ if @offset_ahead >= @expr.length then
806
+ @offset_atail = @offset_ahead
807
+ return @token_ahead = nil
808
+ end
809
+
810
+ # check for an identifier
811
+ if @lex.id_starter? @expr[@offset_ahead] then
812
+ @offset_atail = @offset_ahead
813
+ loop do
814
+ @offset_atail += 1
815
+ break unless @lex.id_continuer? @expr[@offset_atail]
816
+ end
817
+ return @token_ahead =
818
+ @expr[@offset_ahead ... @offset_atail]
819
+ end
820
+
821
+ # check for multi-char particles
822
+ @lex.multichar.each do |etalon|
823
+ if @expr[@offset_ahead, etalon.length] == etalon then
824
+ @offset_atail = @offset_ahead + etalon.length
825
+ return @token_ahead = etalon.to_sym
826
+ end
827
+ end
828
+
829
+ # check for single-char particles
830
+ if @lex.simple_particle? @expr[@offset_ahead] then
831
+ @offset_atail = @offset_ahead + 1
832
+ return @token_ahead = @expr[@offset_ahead].chr.to_sym
833
+ end
834
+
835
+ # give up
836
+ ugh 'ferret-lexical-error',
837
+ input: @expr,
838
+ offset: @offset_ahead,
839
+ lookahead: @expr[@offset_ahead, 10],
840
+ lookbehind: @expr[
841
+ [@offset_ahead - 10, 0].max ... @offset_ahead]
842
+ end
843
+
844
+ def expected! expectation, **extra
845
+ # We'll call [[peek_token]] in advance so that
846
+ # [[@offset_ahead]] would point exactly at the next token.
847
+ tok = peek_token
848
+ ugh('ferret-parse-error',
849
+ expected: expectation,
850
+ got: (tok || '*eof*').to_s,
851
+ input: @expr,
852
+ offset: @offset_ahead,
853
+ **extra)
854
+ end
855
+
856
+ def _consume_token_ahead
857
+ raise 'assertion failed' unless @offset_atail
858
+ @offset_behind = @offset_ahead
859
+ @offset_ahead = @offset_atail
860
+ @token_ahead = nil
861
+ @offset_atail = nil
862
+ return
863
+ end
864
+ private :_consume_token_ahead
865
+
866
+ def get_optional_id
867
+ tok = peek_token
868
+ if tok.is_a? String then
869
+ _consume_token_ahead
870
+ return block_given? ? yield(tok) : tok
871
+ else
872
+ return nil
873
+ end
874
+ end
875
+
876
+ def get_optional_escaped_id expectation
877
+ escaped_p = pass? :'\\'
878
+ if escaped_p then
879
+ return true, get_id(expectation)
880
+ elsif id = get_optional_id then
881
+ return false, id
882
+ else
883
+ return nil
884
+ end
885
+ end
886
+
887
+ def get_id expectation
888
+ return (get_optional_id or expected! expectation)
889
+ end
890
+
891
+ def pass? etalon
892
+ tok = peek_token
893
+ if tok == etalon then
894
+ _consume_token_ahead
895
+ return true
896
+ else
897
+ return false
898
+ end
899
+ end
900
+
901
+ def pass etalon
902
+ pass? etalon or expected! etalon
903
+ return
904
+ end
905
+
906
+ def last_token_offset
907
+ return @offset_behind
908
+ end
909
+
910
+ def next_token_offset
911
+ _skip_intertoken_space \
912
+ unless @token_ahead
913
+ return @offset_ahead
914
+ end
915
+
916
+ def expected_eof!
917
+ expected! '*eof*' unless next_token_offset >= @expr.length
918
+ return
919
+ end
920
+ end
921
+
922
+ class Ferret::Expression
923
+ attr_reader :stages
924
+ attr_reader :selectees
925
+ attr_reader :exemplars
926
+ attr_accessor :multicolumn
927
+ attr_accessor :type
928
+
929
+ def initialize
930
+ super()
931
+ @stages = [Ferret::Stage.new(nil, nil, :left)]
932
+ @selectees = []
933
+ @exemplars = []
934
+ @multicolumn = false
935
+ @type = :select # the default
936
+ return
937
+ end
938
+
939
+ def assign_stage_qualifiers ag
940
+ raise 'type mismatch' \
941
+ unless ag.is_a? Ferret::Alias_Generator
942
+ table_visit_counts = Hash.new 0 # name => count
943
+ @stages.each_with_index do |stage, i|
944
+ table_visit_counts[stage.table.name] += 1
945
+ end
946
+
947
+ # The tables that we visited more than once need
948
+ # distinguishing names.
949
+ @stages.each do |stage|
950
+ stage.qualifier =
951
+ if table_visit_counts[stage.table.name] > 1 then
952
+ ag.create stage.table.name[0]
953
+ else
954
+ stage.table.name
955
+ end
956
+ end
957
+ return
958
+ end
959
+
960
+ def from_clause
961
+ clause = "from "
962
+ @stages.each_with_index do |stage, i|
963
+ # In case of a non-query expression -- a modification --,
964
+ # the last stage is empty and mustn't be joined. It then
965
+ # serves only the purpose of holding the last stalk.
966
+ break if i == @stages.length - 1 and modification?
967
+
968
+ unless i.zero? then
969
+ clause << " #{stage.join_type} join "
970
+ end
971
+
972
+ clause << stage.table.name << " as " << stage.qualifier
973
+
974
+ unless i.zero? then
975
+ clause << " on %s.%s = %s.%s" % [
976
+ stage.parent.qualifier,
977
+ (stage.stalk.haunt || stage.stalk).name,
978
+ stage.qualifier, stage.stalk.ref.name,
979
+ ]
980
+ end
981
+ end
982
+ return clause
983
+ end
984
+
985
+ def where_clause
986
+ raise 'assertion failed' if @exemplars.empty?
987
+ clause = "where "
988
+ @exemplars.each_with_index do |exemplar, i|
989
+ clause << " and " unless i.zero?
990
+ # The qualifier is only necessary if the clause has more
991
+ # than one stage.
992
+ if @stages.length > 1 then
993
+ # In the navigational model, the (primary) filter always
994
+ # lives in the zeroth stage.
995
+ clause << @stages[0].qualifier << "."
996
+ end
997
+ clause << exemplar.column.name << " [test #{i}]"
998
+ end
999
+ return clause
1000
+ end
1001
+
1002
+ # Prepare a [[select]] statement as an
1003
+ # [[Annotated_SQL_Template]]. If this expression represents a
1004
+ # query statement, the result will cover the whole query. If
1005
+ # it represents an update statement, the result will cover the
1006
+ # subquery that determines key value(s) of records in the last
1007
+ # table to update.
1008
+ def select
1009
+ qualifiers_needed =
1010
+ @stages.length != (modification? ? 2 : 1)
1011
+ sql_selectees = @selectees.map do |selectee|
1012
+ (qualifiers_needed ?
1013
+ selectee.stage.qualifier + "." : "") +
1014
+ (selectee.field.haunt || selectee.field).name
1015
+ end.join(', ')
1016
+
1017
+ outputs = {}
1018
+ @selectees.each do |selectee|
1019
+ outputs[selectee.output_name.to_sym] =
1020
+ selectee.interpretation
1021
+ end
1022
+
1023
+ sql = "select"
1024
+ sql << " distinct" if @type == :select_distinct
1025
+ sql << " " << sql_selectees << " " << from_clause
1026
+
1027
+ sql << " " << where_clause unless @exemplars.empty?
1028
+
1029
+ # Determine the shape of the table
1030
+ shape = 0
1031
+ shape |= QSF_MULTICOL if @multicolumn
1032
+ # If no [[unique]] exemplar field is specified or if any of
1033
+ # the joins is performed along a ghost field (i.e.,
1034
+ # possibly a 1->n reference), our result will have multiple
1035
+ # rows.
1036
+ shape |= QSF_MULTIROW \
1037
+ unless @exemplars.any?{|ex| ex.column.unique?} and
1038
+ !@stages[1 .. -1].any?{|stage| stage.stalk.ghost?}
1039
+
1040
+ return Ferret::Annotated_SQL_Template.new(sql,
1041
+ outputs, shape)
1042
+ end
1043
+
1044
+ include Ferret::Constants
1045
+
1046
+ def modification?
1047
+ case @type
1048
+ when :select, :select_distinct then
1049
+ return false
1050
+ when :update, :insert, :delete then
1051
+ return true
1052
+ else
1053
+ raise 'assertion failed'
1054
+ end
1055
+ end
1056
+ end
1057
+
1058
+ class Ferret::Expression_Parser
1059
+ attr_reader :expr
1060
+
1061
+ def initialize raw_expr, schema
1062
+ super()
1063
+ @raw_expr = raw_expr
1064
+ @schema = schema
1065
+
1066
+ @expr = Ferret::Expression.new
1067
+ @scanner = Ferret::Scanner.new @raw_expr
1068
+
1069
+ @first_star_offset = nil
1070
+
1071
+ first_table_name = @scanner.get_id 'table-name'
1072
+ @expr.stages[0].table = @schema[first_table_name] or
1073
+ ugh 'unknown-table',
1074
+ table: first_table_name,
1075
+ offset: @scanner.last_token_offset,
1076
+ expr: @raw_expr
1077
+
1078
+ @scanner.pass :':'
1079
+
1080
+ parenthesised = @scanner.pass? :'('
1081
+ loop do
1082
+ exemplar_escaped, exemplar_column_name =
1083
+ @scanner.get_optional_escaped_id 'column-expected'
1084
+ if exemplar_column_name then
1085
+ exemplar_column =
1086
+ @expr.stages[0].table[exemplar_column_name] or
1087
+ ugh 'unknown-field',
1088
+ field: exemplar_column_name,
1089
+ table: @expr.stages[0].table.name,
1090
+ role: 'key-field',
1091
+ offset: @scanner.last_token_offset,
1092
+ expr: @raw_expr
1093
+ # the key column must be a column, not a ghost field
1094
+ unless exemplar_column.column? then
1095
+ ugh 'not-a-column', field: exemplar_column.name,
1096
+ table: @expr.stages[0].table.name,
1097
+ offset: @scanner.last_token_offset,
1098
+ expr: @raw_expr
1099
+ end
1100
+ exemplar_interpretation = exemplar_escaped ?
1101
+ nil : exemplar_column.interpretation
1102
+ key_output_name =
1103
+ parse_optional_output_name_override ||
1104
+ exemplar_column_name
1105
+ @expr.exemplars.push Ferret::Exemplar.new(
1106
+ exemplar_column, exemplar_interpretation)
1107
+ @expr.selectees.push Ferret::Selectee.new(
1108
+ @expr.stages[0], exemplar_column,
1109
+ key_output_name, exemplar_interpretation)
1110
+ end
1111
+ break unless parenthesised and @scanner.pass? :','
1112
+ end
1113
+ @scanner.pass :')' if parenthesised
1114
+
1115
+ if @scanner.pass? :':' then
1116
+ # Colon without dereference: we should expect a fetch
1117
+ # verb.
1118
+ @expr.type = parse_fetch_verb
1119
+ else
1120
+ @scanner.pass :'->'
1121
+
1122
+ if @scanner.pass? :':' then
1123
+ # Colon past dereference: we should expect an update
1124
+ # verb.
1125
+ @expr.type = parse_update_verb
1126
+ else
1127
+ # Note that [[parse_stage]] can change [[@expr.type]]
1128
+ # if it meets the [[-> :]].
1129
+ parse_stage @expr.stages.last,
1130
+ parens: false
1131
+ end
1132
+ end
1133
+
1134
+ if @expr.modification? then
1135
+ if @first_star_offset then
1136
+ ugh 'star-in-modification',
1137
+ offset: @first_star_offset,
1138
+ expr: @raw_expr
1139
+ end
1140
+ end
1141
+
1142
+ @scanner.expected_eof!
1143
+
1144
+ if @expr.modification? then
1145
+ ugh 'multiple-columns-selected-in-modification' \
1146
+ if @expr.multicolumn
1147
+ end
1148
+
1149
+ unless @expr.multicolumn then
1150
+ # In single-column expressions, only the very last
1151
+ # selectee is actually selected.
1152
+ @expr.selectees[0 ... -1] = []
1153
+ end
1154
+
1155
+ @expr.assign_stage_qualifiers @schema.alias_generator
1156
+ return
1157
+ end
1158
+
1159
+ def start_subsequent_stage parent, stalk, join_type
1160
+ raise 'type mismatch' \
1161
+ unless parent.is_a? Ferret::Stage
1162
+ raise 'type mismatch' \
1163
+ unless stalk.is_a? Ferret::Field
1164
+ raise 'assertion failed' \
1165
+ unless [:left, :inner].include? join_type
1166
+
1167
+ # Note that we don't have the field's offset. But the
1168
+ # caller might.
1169
+ unless stalk.ref then
1170
+ ugh 'unable-to-dereference', field: field.name
1171
+ end
1172
+
1173
+ @expr.stages.push Ferret::Stage.new(
1174
+ parent, stalk, join_type)
1175
+ return
1176
+ end
1177
+
1178
+ def parse_stage stage, parens: false
1179
+ starred = false
1180
+ stage_empty = true
1181
+ loop do
1182
+ field_escaped, field_name =
1183
+ @scanner.get_optional_escaped_id 'field-expected'
1184
+ if field_name then
1185
+ field_offset = @scanner.last_token_offset
1186
+
1187
+ raise 'assertion failed' unless stage.table
1188
+
1189
+ field = stage.table[field_name] or
1190
+ ugh 'unknown-field', field: field_name,
1191
+ expr: @raw_expr,
1192
+ offset: field_offset
1193
+
1194
+ field_output_name =
1195
+ parse_optional_output_name_override ||
1196
+ field_name
1197
+
1198
+ # Has this column, or its name, been used already?
1199
+ (0 ... @expr.selectees.length).reverse_each do |i|
1200
+ selectee = @expr.selectees[i]
1201
+ if (selectee.stage == stage and
1202
+ selectee.field == field) or
1203
+ selectee.output_name == field_output_name then
1204
+ # Possible conflict detected.
1205
+ if selectee.star? then
1206
+ # The previous selectee was implicit, added due
1207
+ # to star expansion. We'll just discard it, for
1208
+ # explicit fields take precedence.
1209
+ @expr.selectees.delete_at i
1210
+ else
1211
+ ugh 'duplicate-field-in-stage',
1212
+ field: field.name,
1213
+ output_name: field_output_name,
1214
+ expr: @raw_expr,
1215
+ offset: field_offset
1216
+ end
1217
+ end
1218
+ end
1219
+ @expr.selectees.push Ferret::Selectee.new(
1220
+ stage, field,
1221
+ field_output_name, field_escaped ?
1222
+ nil : field.interpretation)
1223
+ stage_empty = false
1224
+ if @scanner.pass? :'(' then
1225
+ join_type = parse_optional_join_arrow or
1226
+ expected!('join-arrow',
1227
+ candidates: '-> <->')
1228
+ # If something goes wrong trying to start a new
1229
+ # stage, it must be the last field's fault.
1230
+ # ([[start_subsequent_stage]] won't attach the
1231
+ # offset to the ugh on its own just because it
1232
+ # doesn't _have_ the offset.)
1233
+ ugh? offset: field_offset do
1234
+ start_subsequent_stage stage, field, join_type
1235
+ end
1236
+ parse_stage @expr.stages.last,
1237
+ parens: true
1238
+ @scanner.pass :')'
1239
+ else
1240
+ if !parens and @scanner.pass? :':' then
1241
+ @expr.type = parse_fetch_verb
1242
+ break
1243
+ end
1244
+ if join_type = parse_optional_join_arrow then
1245
+ ugh? offset: field_offset do
1246
+ start_subsequent_stage stage, field, join_type
1247
+ end
1248
+ if !parens and @scanner.pass? :':' then
1249
+ @expr.type = parse_update_verb
1250
+ else
1251
+ parse_stage @expr.stages.last,
1252
+ parens: parens
1253
+ end
1254
+ break
1255
+ end
1256
+ end
1257
+ elsif @scanner.pass? :'*' then
1258
+ @first_star_offset ||= @scanner.last_token_offset
1259
+ # only one star per stage
1260
+ @scanner.expected! 'field-name' if starred
1261
+ starred = true
1262
+ @expr.multicolumn = true
1263
+
1264
+ stage.table.columns.each do |column|
1265
+ # We'll skip columns that have been selected (at
1266
+ # this stage) already, or columns whose names have
1267
+ # already been used.
1268
+ next if @expr.selectees.any? do |selectee|
1269
+ (selectee.stage == stage and
1270
+ selectee.column == column) or
1271
+ selectee.output_name == column.name
1272
+ end
1273
+ @expr.selectees.push Ferret::Selectee.new(
1274
+ stage, column,
1275
+ column.name, column.interpretation,
1276
+ true)
1277
+ end
1278
+
1279
+ # Note that [[->]] can not appear immediately
1280
+ # following a [[*]].
1281
+ break
1282
+ else
1283
+ if stage_empty then
1284
+ @scanner.expected! 'field-name'
1285
+ end
1286
+ break
1287
+ end
1288
+
1289
+ if @scanner.pass? :',' then
1290
+ @expr.multicolumn = true
1291
+ else
1292
+ break
1293
+ end
1294
+ end
1295
+ return
1296
+ end
1297
+
1298
+ def parse_optional_output_name_override
1299
+ if @scanner.pass? :"'" then
1300
+ override = @scanner.get_id 'output-name-override'
1301
+ @scanner.pass :"'"
1302
+ return override
1303
+ else
1304
+ return nil
1305
+ end
1306
+ end
1307
+
1308
+ def parse_optional_join_arrow
1309
+ return :left if @scanner.pass? :'->'
1310
+ return :inner if @scanner.pass? :'<->'
1311
+ return nil
1312
+ end
1313
+
1314
+ def parse_fetch_verb
1315
+ verb = @scanner.get_id 'fetch-verb'
1316
+ case verb
1317
+ when 'select' then
1318
+ return @scanner.pass?('distinct') ?
1319
+ :select_distinct : :select
1320
+ when 'distinct' then
1321
+ return :select_distinct
1322
+ else
1323
+ ugh 'unknown-fetch-verb',
1324
+ got: verb,
1325
+ input: @expr,
1326
+ offset: @scanner.last_token_offset
1327
+ end
1328
+ end
1329
+
1330
+ def parse_update_verb
1331
+ verb = @scanner.get_id 'update-verb'
1332
+ case verb
1333
+ when 'update', 'set' then
1334
+ return :update
1335
+ when 'delete' then
1336
+ return :delete
1337
+ else
1338
+ ugh 'unknown-update-verb',
1339
+ got: verb,
1340
+ input: @expr,
1341
+ offset: @scanner.last_token_offset
1342
+ end
1343
+ end
1344
+ end
1345
+
1346
+ class Ferret::Stage
1347
+ attr_reader :parent
1348
+ attr_reader :stalk
1349
+ attr_reader :join_type
1350
+
1351
+ attr_accessor :table
1352
+ attr_accessor :qualifier
1353
+
1354
+ def initialize parent, stalk, join_type
1355
+ raise 'type mismatch' \
1356
+ unless parent.nil? or parent.is_a? Ferret::Stage
1357
+ raise 'type mismatch' \
1358
+ unless parent.nil? ?
1359
+ stalk.nil? : stalk.is_a?(Ferret::Field)
1360
+ raise 'assertion failed' \
1361
+ unless [:left, :inner].include? join_type
1362
+ super()
1363
+ @parent = parent
1364
+ @stalk = stalk
1365
+ @join_type = join_type
1366
+
1367
+ # If we have a stalk, it identifies this stage's table.
1368
+ # If not (which only happens for the very first stage),
1369
+ # the parser will use [[table=]] to set the stage's table
1370
+ # a bit later.
1371
+ @table = stalk && stalk.ref.table
1372
+ @qualifier = nil
1373
+ return
1374
+ end
1375
+ end
1376
+
1377
+ class Ferret::Field
1378
+ def inspect
1379
+ result = "#<Ferret::Field #{@table.name}.#{name}: "
1380
+ if primary_key? then
1381
+ result << 'primary key '
1382
+ else
1383
+ result << 'optional ' if optional?
1384
+ result << 'unique ' if unique?
1385
+ end
1386
+ if reference? then
1387
+ result << 'unconstrained ' if unconstrained?
1388
+ result << "ghost #{@haunt.name} " if ghost?
1389
+ result << 'ref %s(%s)' % [ref.table.name, ref.name]
1390
+ else
1391
+ result << (interpretation || type).to_s
1392
+ end
1393
+ # Note that [[default]] is an unsanitised, unprocessed
1394
+ # string extracted from the schema. In pathological cases,
1395
+ # it can potentially contain the [[>]] character.
1396
+ result << " = #{default}" if default
1397
+ result << '>'
1398
+ end
1399
+
1400
+ attr_reader :table
1401
+
1402
+ attr_reader :name
1403
+
1404
+ attr_reader :type
1405
+
1406
+ attr_reader :interpretation
1407
+
1408
+ def unique?
1409
+ return (@flags & (FF_PRIMARY_KEY | FF_EXPL_UNIQUE)) != 0
1410
+ end
1411
+
1412
+ def optional?
1413
+ return (@flags & FF_OPTIONAL) != 0
1414
+ end
1415
+
1416
+ def primary_key?
1417
+ return (@flags & FF_PRIMARY_KEY) != 0
1418
+ end
1419
+
1420
+ def unconstrained?
1421
+ return (@flags & FF_UNCONSTRAINED) != 0
1422
+ end
1423
+
1424
+ def reference?
1425
+ return (@flags & FF_REFERENCE) != 0
1426
+ end
1427
+
1428
+ def ghost?
1429
+ return (@flags & FF_GHOST) != 0
1430
+ end
1431
+
1432
+ def column?
1433
+ return (@flags & FF_GHOST) == 0
1434
+ end
1435
+
1436
+ attr_reader :haunt
1437
+
1438
+ attr_reader :default
1439
+
1440
+ attr_reader :ref
1441
+
1442
+ # Note that the parser does not look up the referred and
1443
+ # haunted columns, for at the parsing time, not all the
1444
+ # columns are yet available so trying to look up forward
1445
+ # references would spuriously fail. Instead, it creates
1446
+ # 'relocation thunks' and [[yield]]:s them to the caller, who
1447
+ # must arrange to have them called (in the same order as they
1448
+ # were [[yield]]:ed) after the whole schema has been loaded
1449
+ # and which will perform these lookups and fill in the
1450
+ # corresponding slots in the structure.
1451
+ def initialize table, name, spec, &thunk
1452
+ raise 'type mismatch' unless table.is_a? Ferret::Table
1453
+ raise 'type mismatch' unless name.is_a? String
1454
+ raise 'type mismatch' unless spec.is_a? String
1455
+ super()
1456
+ @table = table
1457
+ @name = name
1458
+ unless spec.strip =~ %r{\A
1459
+ (
1460
+ | (?<primary_key> \b primary \s+ key \s* ,)
1461
+ | (?<unique> \b unique \b)
1462
+ | (?<optional> \b optional \b)
1463
+ | \b ghost \b \s* (?<haunt> \b \w+ \b)
1464
+ )\s*
1465
+ ( (?<type> \b \w+ \b)
1466
+ | (?<unconstrained> \b unconstrained \b \s*)?
1467
+ \b ref \b \s* (?<ref_table> \w+)
1468
+ ( \s* \( \s* (?<ref_field> \w+) \s* \) )?
1469
+ )
1470
+ ( \s* = \s* (?<default> [^\s].*) )?
1471
+ \Z}x then
1472
+ ugh 'invalid-field-specification',
1473
+ input: spec
1474
+ end
1475
+
1476
+ unless $~['haunt'] then
1477
+ # Do we know the type?
1478
+ if $~['type'] and !%w{
1479
+ integer real varchar text blob iso8601
1480
+ unix_time subsecond_unix_time
1481
+ json pretty_json yaml
1482
+ ruby_marshal packed_hex}.include? $~['type'] then
1483
+ ugh 'unknown-type', type: $~['type']
1484
+ end
1485
+ else
1486
+ # The regex above is a bit too permissive.
1487
+ if $~['type'] or $~['unconstrained'] or $~['default'] then
1488
+ ugh 'invalid-field-specification',
1489
+ input: spec
1490
+ end
1491
+ end
1492
+
1493
+ if $~['primary_key'] and
1494
+ ($~['ref_table'] or $~['default']) then
1495
+ ugh 'invalid-field-specification',
1496
+ input: spec
1497
+ end
1498
+
1499
+ @flags = 0
1500
+ @flags |= FF_PRIMARY_KEY if $~['primary_key']
1501
+ @flags |= FF_EXPL_UNIQUE if $~['unique']
1502
+ @flags |= FF_OPTIONAL if $~['optional']
1503
+
1504
+ # The current [[$~]] is unlikely to survive until the
1505
+ # relocation thunk gets called, so we'll have to copy
1506
+ # [[ref_table]] and [[ref_field]] out of it, into local
1507
+ # variables.
1508
+ if ref_table_name = $~['ref_table'] then
1509
+ @flags |= FF_REFERENCE
1510
+ ref_field_name = $~['ref_field']
1511
+ yield(proc do |schema|
1512
+ raise 'assertion failed' if @ref
1513
+ ref_table = schema[ref_table_name]
1514
+ ugh 'unknown-table', table: ref_table_name \
1515
+ unless ref_table
1516
+ ugh? referring_field: @name,
1517
+ referring_field_table: @table.name do
1518
+ if ref_field_name then
1519
+ @ref = ref_table[ref_field_name] or
1520
+ ugh 'unknown-field', field: ref_field_name,
1521
+ table: ref_table.name,
1522
+ significance: 'referred'
1523
+ else
1524
+ @ref = ref_table.primary_key or
1525
+ ugh 'no-primary-key', table: ref_table.name,
1526
+ significance: 'referred'
1527
+ end
1528
+ ugh 'not-a-column', field: @ref.name,
1529
+ table: ref.table.name,
1530
+ significance: 'referred' \
1531
+ unless @ref.column?
1532
+ end
1533
+ @type = @ref.type
1534
+ end)
1535
+ else
1536
+ @type = $~['type']
1537
+ end
1538
+
1539
+ if haunt = $~['haunt'] then
1540
+ @flags |= FF_GHOST
1541
+ yield(proc do |schema|
1542
+ ugh? significance: 'relied-on-by-ghost-field',
1543
+ ghost_field: @name do
1544
+ @haunt = @table[haunt]
1545
+ unless @haunt then
1546
+ ugh 'unknown-field', field: haunt
1547
+ end
1548
+ unless @haunt.column? then
1549
+ ugh 'not-a-column', field: @haunt.name
1550
+ end
1551
+ @type ||= @haunt.type
1552
+ unless @haunt.type == @type then
1553
+ ugh 'ghost-field-type-mismatch',
1554
+ field: @name,
1555
+ table: @table.name,
1556
+ type: @type.downcase,
1557
+ haunted_column: @haunt.name,
1558
+ haunted_column_type: @haunt.type.downcase
1559
+ end
1560
+ end
1561
+ end)
1562
+ end
1563
+
1564
+ @flags |= FF_UNCONSTRAINED if $~['unconstrained']
1565
+ @default = $~['default']
1566
+
1567
+ if @type then
1568
+ # [[@type]] can be [[nil]] if it's a reference field.
1569
+ # Then, the type and interpretation will be later copied
1570
+ # from the referred column.
1571
+ case @type.downcase
1572
+ when 'iso8601', 'json' then
1573
+ @interpretation = @type.downcase.to_sym
1574
+ @type = 'varchar'
1575
+ when 'yaml', 'pretty_json' then
1576
+ @interpretation = @type.downcase.to_sym
1577
+ @type = 'text'
1578
+ when 'ruby_marshal' then
1579
+ @interpretation = @type.downcase.to_sym
1580
+ @type = 'blob'
1581
+ when 'unix_time' then
1582
+ @interpretation = @type.downcase.to_sym
1583
+ @type = 'integer'
1584
+ when 'subsecond_unix_time' then
1585
+ @interpretation = @type.downcase.to_sym
1586
+ @type = 'real'
1587
+ else
1588
+ @interpretation = nil
1589
+ end
1590
+ end
1591
+
1592
+ return
1593
+ end
1594
+
1595
+ # [[Ferret::Field]] flags
1596
+ FF_PRIMARY_KEY = 0x01
1597
+ FF_EXPL_UNIQUE = 0x02
1598
+ FF_OPTIONAL = 0x04
1599
+ FF_UNCONSTRAINED = 0x08
1600
+ FF_REFERENCE = 0x10
1601
+ FF_GHOST = 0x20
1602
+
1603
+ def sql_to_declare
1604
+ sql = "#@name #@type"
1605
+ if primary_key? then
1606
+ sql << " primary key"
1607
+ else
1608
+ sql << " unique" if unique?
1609
+ sql << " not null" unless optional?
1610
+ sql << " default #@default" if default
1611
+ end
1612
+ if reference? and !unconstrained? then
1613
+ sql << "\n references %s(%s)" %
1614
+ [@ref.table.name, @ref.name]
1615
+ end
1616
+ return sql
1617
+ end
1618
+ end
1619
+
1620
+ # [[sql]] is a [[String]] of the SQL template together with
1621
+ # placeholders. [[outputs]] is [[nil]] if this SQL is not a
1622
+ # query, or a [[Hash]] containing name->interpretation
1623
+ # mappings (in the order the values are [[select]]:ed by this
1624
+ # SQL statement) if it is. [[shape]] describes the expected
1625
+ # result set that would result, assuming all the inputs are
1626
+ # single values.
1627
+ Ferret::Annotated_SQL_Template =
1628
+ Struct.new :sql, :outputs, :shape
1629
+
1630
+ class Ferret::Selectee
1631
+ attr_reader :stage
1632
+ attr_reader :field
1633
+ attr_reader :output_name
1634
+ attr_reader :interpretation
1635
+
1636
+ def initialize stage, field,
1637
+ output_name, interpretation,
1638
+ star_p = false
1639
+ raise 'type mismatch' unless field.is_a? Ferret::Field
1640
+ raise 'type mismatch' unless output_name.is_a? String
1641
+ super()
1642
+ @stage = stage
1643
+ @field = field
1644
+ @output_name = output_name
1645
+ @interpretation = interpretation
1646
+ @star_p = star_p
1647
+ return
1648
+ end
1649
+
1650
+ def star?
1651
+ return @star_p
1652
+ end
1653
+ end
1654
+
1655
+ class Ferret::Exemplar
1656
+ attr_reader :column
1657
+ attr_reader :interpretation
1658
+
1659
+ def initialize column, interpretation
1660
+ raise 'type mismatch' unless column.is_a? Ferret::Field
1661
+ raise 'assertion failed' unless column.column?
1662
+ raise 'type mismatch' \
1663
+ unless interpretation.nil? \
1664
+ or interpretation.is_a? Symbol
1665
+ super()
1666
+ @column = column
1667
+ @interpretation = interpretation
1668
+ return
1669
+ end
1670
+ end
1671
+
1672
+ class Ferret::Parameter_Collector < Array
1673
+ # [[parameter]] can be a plain value, a collection
1674
+ # ([[Enumerator]]) of values, or a [[nil]].
1675
+ def feed parameter, exemplar_spec
1676
+ raise 'type mismatch' \
1677
+ unless exemplar_spec.is_a? Ferret::Exemplar
1678
+ if parameter.nil? and exemplar_spec.column.optional? then
1679
+ test = "is " + _feed(nil)
1680
+ selects_one_p = false
1681
+ else
1682
+ *exemplar_values = *parameter # force to array
1683
+ exemplar_values.map! do |value|
1684
+ Ferret.deterpret exemplar_spec.interpretation, value
1685
+ end
1686
+ if exemplar_values.length != 1 then
1687
+ test = "in ("
1688
+ exemplar_values.each_with_index do |value, i|
1689
+ test << ", " unless i.zero?
1690
+ test << _feed(value)
1691
+ end
1692
+ test << ")"
1693
+ selects_one_p = false
1694
+ else
1695
+ test = "= " + _feed(exemplar_values.first)
1696
+ selects_one_p = exemplar_spec.column.unique?
1697
+ end
1698
+ end
1699
+ return test, selects_one_p
1700
+ end
1701
+
1702
+ # Add the given [[parameter]] to this collector and return a
1703
+ # string containing its placeholder, in the form of colon
1704
+ # followed by a sequential number (0-based).
1705
+ def _feed parameter
1706
+ placeholder = length
1707
+ push parameter
1708
+ return ":#{placeholder}"
1709
+ end
1710
+ private :_feed
1711
+
1712
+ def to_hash
1713
+ h = {}
1714
+ each_with_index do |parameter, i|
1715
+ h[i.to_s.to_sym] = parameter
1716
+ end
1717
+ return h
1718
+ end
1719
+ end