alf 0.9.0

Sign up to get free protection for your applications and to get access to all the features.
Files changed (94) hide show
  1. data/CHANGELOG.md +5 -0
  2. data/Gemfile +2 -0
  3. data/Gemfile.lock +42 -0
  4. data/LICENCE.md +22 -0
  5. data/Manifest.txt +15 -0
  6. data/README.md +769 -0
  7. data/Rakefile +23 -0
  8. data/TODO.md +26 -0
  9. data/alf.gemspec +191 -0
  10. data/alf.noespec +30 -0
  11. data/bin/alf +31 -0
  12. data/examples/autonum.alf +6 -0
  13. data/examples/cities.rash +4 -0
  14. data/examples/clip.alf +3 -0
  15. data/examples/compact.alf +2 -0
  16. data/examples/database.alf +6 -0
  17. data/examples/defaults.alf +3 -0
  18. data/examples/extend.alf +3 -0
  19. data/examples/group.alf +3 -0
  20. data/examples/intersect.alf +4 -0
  21. data/examples/join.alf +2 -0
  22. data/examples/minus.alf +8 -0
  23. data/examples/nest.alf +2 -0
  24. data/examples/nulls.rash +3 -0
  25. data/examples/parts.rash +6 -0
  26. data/examples/project.alf +2 -0
  27. data/examples/quota.alf +4 -0
  28. data/examples/rename.alf +3 -0
  29. data/examples/restrict.alf +2 -0
  30. data/examples/runall.sh +26 -0
  31. data/examples/schema.yaml +28 -0
  32. data/examples/sort.alf +4 -0
  33. data/examples/summarize.alf +16 -0
  34. data/examples/suppliers.rash +5 -0
  35. data/examples/supplies.rash +12 -0
  36. data/examples/ungroup.alf +4 -0
  37. data/examples/union.alf +3 -0
  38. data/examples/unnest.alf +4 -0
  39. data/examples/with.alf +23 -0
  40. data/lib/alf.rb +2984 -0
  41. data/lib/alf/loader.rb +1 -0
  42. data/lib/alf/renderer/text.rb +153 -0
  43. data/lib/alf/renderer/yaml.rb +22 -0
  44. data/lib/alf/version.rb +14 -0
  45. data/spec/aggregator_spec.rb +62 -0
  46. data/spec/alf_spec.rb +47 -0
  47. data/spec/assumptions_spec.rb +15 -0
  48. data/spec/environment/explicit_spec.rb +15 -0
  49. data/spec/environment/folder_spec.rb +30 -0
  50. data/spec/examples_spec.rb +26 -0
  51. data/spec/lispy_spec.rb +23 -0
  52. data/spec/operator/command_methods_spec.rb +38 -0
  53. data/spec/operator/non_relational/autonum_spec.rb +61 -0
  54. data/spec/operator/non_relational/clip_spec.rb +49 -0
  55. data/spec/operator/non_relational/compact/buffer_based.rb +30 -0
  56. data/spec/operator/non_relational/compact/sort_based_spec.rb +30 -0
  57. data/spec/operator/non_relational/compact_spec.rb +38 -0
  58. data/spec/operator/non_relational/defaults_spec.rb +55 -0
  59. data/spec/operator/non_relational/sort_spec.rb +66 -0
  60. data/spec/operator/relational/extend_spec.rb +34 -0
  61. data/spec/operator/relational/group_spec.rb +54 -0
  62. data/spec/operator/relational/intersect_spec.rb +58 -0
  63. data/spec/operator/relational/join/hash_based_spec.rb +63 -0
  64. data/spec/operator/relational/minus_spec.rb +56 -0
  65. data/spec/operator/relational/nest_spec.rb +32 -0
  66. data/spec/operator/relational/project_spec.rb +65 -0
  67. data/spec/operator/relational/quota_spec.rb +44 -0
  68. data/spec/operator/relational/rename_spec.rb +32 -0
  69. data/spec/operator/relational/restrict_spec.rb +56 -0
  70. data/spec/operator/relational/summarize/sort_based_spec.rb +31 -0
  71. data/spec/operator/relational/summarize_spec.rb +41 -0
  72. data/spec/operator/relational/ungroup_spec.rb +35 -0
  73. data/spec/operator/relational/union_spec.rb +35 -0
  74. data/spec/operator/relational/unnest_spec.rb +32 -0
  75. data/spec/reader/alf_file_spec.rb +15 -0
  76. data/spec/reader/input.rb +2 -0
  77. data/spec/reader/rash_spec.rb +31 -0
  78. data/spec/reader_spec.rb +27 -0
  79. data/spec/renderer/text/cell_spec.rb +34 -0
  80. data/spec/renderer/text/row_spec.rb +30 -0
  81. data/spec/renderer/text/table_spec.rb +39 -0
  82. data/spec/renderer_spec.rb +42 -0
  83. data/spec/spec_helper.rb +26 -0
  84. data/spec/tools/ordering_key_spec.rb +81 -0
  85. data/spec/tools/projection_key_spec.rb +83 -0
  86. data/spec/tools/tools_spec.rb +25 -0
  87. data/spec/tools/tuple_handle_spec.rb +78 -0
  88. data/tasks/debug_mail.rake +78 -0
  89. data/tasks/debug_mail.txt +13 -0
  90. data/tasks/gem.rake +68 -0
  91. data/tasks/spec_test.rake +79 -0
  92. data/tasks/unit_test.rake +77 -0
  93. data/tasks/yard.rake +51 -0
  94. metadata +282 -0
@@ -0,0 +1,5 @@
1
+ # 0.9.0 / 2011.06.19
2
+
3
+ * Enhancements
4
+
5
+ * Birthday!
data/Gemfile ADDED
@@ -0,0 +1,2 @@
1
+ source 'http://rubygems.org'
2
+ gemspec :name => "alf"
@@ -0,0 +1,42 @@
1
+ PATH
2
+ remote: .
3
+ specs:
4
+ alf (0.9.0)
5
+ quickl (~> 0.2.1)
6
+
7
+ GEM
8
+ remote: http://rubygems.org/
9
+ specs:
10
+ bluecloth (2.0.11)
11
+ diff-lcs (1.1.2)
12
+ highline (1.6.2)
13
+ noe (1.3.0)
14
+ highline (~> 1.6.0)
15
+ quickl (~> 0.2.0)
16
+ wlang (~> 0.10.1)
17
+ quickl (0.2.1)
18
+ rake (0.8.7)
19
+ rspec (2.6.0)
20
+ rspec-core (~> 2.6.0)
21
+ rspec-expectations (~> 2.6.0)
22
+ rspec-mocks (~> 2.6.0)
23
+ rspec-core (2.6.4)
24
+ rspec-expectations (2.6.0)
25
+ diff-lcs (~> 1.1.2)
26
+ rspec-mocks (2.6.0)
27
+ wlang (0.10.2)
28
+ yard (0.7.2)
29
+
30
+ PLATFORMS
31
+ java
32
+ ruby
33
+
34
+ DEPENDENCIES
35
+ alf!
36
+ bluecloth (~> 2.0.9)
37
+ bundler (~> 1.0)
38
+ noe (~> 1.3.0)
39
+ rake (~> 0.8.7)
40
+ rspec (~> 2.6.0)
41
+ wlang (~> 0.10.1)
42
+ yard (~> 0.7.2)
@@ -0,0 +1,22 @@
1
+ # The MIT Licence
2
+
3
+ Copyright (c) 2011 - Bernard Lambeau
4
+
5
+ Permission is hereby granted, free of charge, to any person obtaining
6
+ a copy of this software and associated documentation files (the
7
+ "Software"), to deal in the Software without restriction, including
8
+ without limitation the rights to use, copy, modify, merge, publish,
9
+ distribute, sublicense, and/or sell copies of the Software, and to
10
+ permit persons to whom the Software is furnished to do so, subject to
11
+ the following conditions:
12
+
13
+ The above copyright notice and this permission notice shall be
14
+ included in all copies or substantial portions of the Software.
15
+
16
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
17
+ EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
18
+ MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
19
+ NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE
20
+ LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION
21
+ OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION
22
+ WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
@@ -0,0 +1,15 @@
1
+ bin/**/*
2
+ examples/**/*
3
+ lib/**/*
4
+ spec/**/*
5
+ tasks/**/*
6
+ Rakefile
7
+ alf.gemspec
8
+ alf.noespec
9
+ CHANGELOG.md
10
+ Gemfile
11
+ Gemfile.lock
12
+ LICENCE.md
13
+ Manifest.txt
14
+ README.md
15
+ TODO.md
@@ -0,0 +1,769 @@
1
+ # Alf - Classy data-manipulation dressed in a DSL (+ commandline)
2
+
3
+ % [sudo] gem install alf
4
+ % alf --help
5
+
6
+ ## Links
7
+
8
+ * {http://rubydoc.info/github/blambeau/alf/master/frames} (read this file there!)
9
+ * {http://github.com/blambeau/alf} (source code)
10
+ * {http://revision-zero.org} (author's blog)
11
+
12
+ ## Description
13
+
14
+ Alf is a commandline tool and Ruby library to manipulate data with all the power
15
+ of a truly relational algebra approach. Objectives behind Alf are manifold:
16
+
17
+ * Pragmatically, Alf aims at being a useful commandline executable for
18
+ manipulating csv files, database records, or whatever looks like a (physical
19
+ representation of a) relation. See 'alf --help' for the list of available
20
+ commands and implemented relational operators.
21
+
22
+ % alf restrict suppliers -- "city == 'London'" | alf join cities
23
+
24
+ * Alf is also a 100% Ruby relational algebra implementation shipped with a simple
25
+ to use, powerful, functional DSL for compiling and evaluating relational queries.
26
+ Alf is not limited to simple scalar values, but admit values of arbitrary
27
+ complexity (under a few requirements about their implementation, see next
28
+ section). See 'alf --help' as well as .alf files in the examples directory
29
+ for syntactic examples.
30
+
31
+ Alf.lispy.compile{
32
+ (join (restrict :suppliers, lambda{ city == 'London' }), :cities)
33
+ }
34
+
35
+ * Alf is also an educational tool, that I've written to draw people's attention
36
+ about the ill-known relational theory (and ill-represented by SQL). The tool
37
+ is largely inspired from TUTORIAL D, the tutorial language of Chris Date and
38
+ Hugh Darwen in their books, more specifically in
39
+ {http://www.thethirdmanifesto.com/ The Third Manifesto (TTM)}.
40
+ However, Alf only provides an overview of the relational _algebra_ defined
41
+ there (Alf is neither a relational _database_, nor a relational _language_).
42
+ I hope that people (especially talented developers) will be sufficiently
43
+ enticed by features shown here to open that book, read it more deeply, and
44
+ implement new stuff around Date & Darwen vision. Have a look at the result of
45
+ the following query for things that you'll never ever have in SQL (see also
46
+ 'alf help quota', 'alf help nest', 'alf help group', ...):
47
+
48
+ % alf --text summarize supplies --by=sid -- total "sum(:qty)" -- which "group(:pid)"
49
+
50
+ * Last, but not least, Alf is an attempt to help me test some research ideas and
51
+ communicate about them with people that already know (all or part) of the TTM
52
+ vision of relational theory. These people include members of the TTM mailing
53
+ list as well as other people implementing some of the TTM ideas (see
54
+ {https://github.com/dkubb/veritas Dan Kubb's Veritas project} for example). For
55
+ this reason, specific features and/or operators are mine, should be considered
56
+ 'research work in progress', and used with care because not necessarily in
57
+ conformance with the TTM.
58
+
59
+ % alf --text quota supplies --by=sid --order=qty -- pos "count()"
60
+
61
+ ## Overview of relational theory
62
+
63
+ We quickly recall relational theory in this section, as described in the TTM
64
+ book. Readers not familiar with Date and Darwen's vision of relational theory
65
+ should probably read this section, even if fluent in SQL. Others may probably
66
+ skip this section. A quick test?
67
+
68
+ > _A relation is a value, precisely a set of tuples, which are themselves values.
69
+ Therefore, a relation is immutable, not ordered, does not contain duplicates,
70
+ and does not have null/nil attributes._
71
+
72
+ Familiar? Skip. Otherwise, read on.
73
+
74
+ ### The example database
75
+
76
+ This README file shows a lot of examples built on top of the following suppliers
77
+ & parts database (almost identical to the original version in C.J. Date database
78
+ books). By default, the alf command line is wired to this embedded example. All
79
+ examples shown here should therefore work immediately, if you want to reproduce
80
+ them!
81
+
82
+ % alf show database
83
+
84
+ +-------------------------------------+-------------------------------------------------+-------------------------+------------------------+
85
+ | :suppliers | :parts | :cities | :supplies |
86
+ +-------------------------------------+-------------------------------------------------+-------------------------+------------------------+
87
+ | +------+-------+---------+--------+ | +------+-------+--------+------------+--------+ | +----------+----------+ | +------+------+------+ |
88
+ | | :sid | :name | :status | :city | | | :pid | :name | :color | :weight | :city | | | :city | :country | | | :sid | :pid | :qty | |
89
+ | +------+-------+---------+--------+ | +------+-------+--------+------------+--------+ | +----------+----------+ | +------+------+------+ |
90
+ | | S1 | Smith | 20 | London | | | P1 | Nut | Red | 12.0000000 | London | | | London | England | | | S1 | P1 | 300 | |
91
+ | | S2 | Jones | 10 | Paris | | | P2 | Bolt | Green | 17.0000000 | Paris | | | Paris | France | | | S1 | P2 | 200 | |
92
+ | | S3 | Blake | 30 | Paris | | | P3 | Screw | Blue | 17.0000000 | Oslo | | | Athens | Greece | | | S1 | P3 | 400 | |
93
+ | | S4 | Clark | 20 | London | | | P4 | Screw | Red | 14.0000000 | London | | | Brussels | Belgium | | | S1 | P4 | 200 | |
94
+ | | S5 | Adams | 30 | Athens | | | P5 | Cam | Blue | 12.0000000 | Paris | | +----------+----------+ | | S1 | P5 | 100 | |
95
+ | +------+-------+---------+--------+ | | P6 | Cog | Red | 19.0000000 | London | | | | S1 | P6 | 100 | |
96
+ | | +------+-------+--------+------------+--------+ | | | S2 | P1 | 300 | |
97
+ | | | | | S2 | P2 | 400 | |
98
+ | | | | | S3 | P2 | 200 | |
99
+ | | | | | S4 | P2 | 200 | |
100
+ | | | | | S4 | P4 | 300 | |
101
+ | | | | | S4 | P5 | 400 | |
102
+ | | | | +------+------+------+ |
103
+ +-------------------------------------+-------------------------------------------------+-------------------------+------------------------+
104
+
105
+ Many people think that relational databases are necessary 'flat', that they are
106
+ necessarily limited to simply scalar values in two dimension tables. This is
107
+ wrong; most SQL databases are indeed 'flat', but _relations_ (in the mathematical
108
+ sense of the relational theory) are not! Look, **the example above is a relation!**;
109
+ that 'contains' other relations as particular values, which, in turn, could
110
+ 'contain' relations or any other 'simple' or more 'complex' value... This is not
111
+ "flat" at all, after all :-)
112
+
113
+ ### Types and Values
114
+
115
+ To understand what is a relation exactly, one needs to remember elementary
116
+ notions of set theory and the concepts of _type_ and _value_.
117
+
118
+ * A _type_ is a finite set of values; it is non particularly ordered and, being
119
+ a set, it does never contains two values which are considered equal.
120
+
121
+ * A _value_ is **immutable** (you cannot 'change' a value, in any way), has no
122
+ localization in time and space, and is always typed (that is, it is always
123
+ accompanied by some identification of the type it belongs to).
124
+
125
+ As you can see, _type_ and _value_ are not the same concepts as _class_ and
126
+ _object_, with which you are probably familiar with. Alf considers that the
127
+ latter are _implementations_ of the former. Alf assumes _valid_ implementations
128
+ (equality and hash methods must be correct) and _valid_ usage (objects used for
129
+ representing values are kept immutable in practice). Alf _assumes_ this, but
130
+ does not _enforces_ it: it is your responsibility to use Alf in conformance with
131
+ these preconditions. That being said, if you want **arrays, colors, ranges, or
132
+ whatever in your relations**, just do it! You can even join on them, restrict on
133
+ them, summarize on them, and so on:
134
+
135
+ % alf extend suppliers -- chars "name.chars.to_a" | alf --text restrict -- "chars.last == 's'"
136
+
137
+ +------+-------+---------+--------+-----------------+
138
+ | :sid | :name | :status | :city | :chars |
139
+ +------+-------+---------+--------+-----------------+
140
+ | S2 | Jones | 10 | Paris | [J, o, n, e, s] |
141
+ | S5 | Adams | 30 | Athens | [A, d, a, m, s] |
142
+ +------+-------+---------+--------+-----------------+
143
+
144
+ A last, very important word about values. **Null/nil is not a value**. Strictly
145
+ speaking therefore, you may not use null/nil inside your data files or datasources
146
+ representing relations. That being said, Alf provides specific support for handling
147
+ them, because they appear in today's databases in practice and that Alf aims at
148
+ being a tool that helps you tackling _practical_ problems. See the section with
149
+ title "Why is Alf Exactly?" later.
150
+
151
+ ### Tuples and Relations
152
+
153
+ Tuples (aka records) and relations are values as well, which explains why you
154
+ can have them inside relations!
155
+
156
+ * Logically speaking, a tuple is a set of (attribute name, attribute value)
157
+ pairs. Moreover, it does not contain two attributes with the same name and is
158
+ **not particularly ordered**. Also, **a tuple is a _value_, and is therefore
159
+ immutable**. Last, but not least, a tuple **does not admit nulls/nils**. Tuples
160
+ in Alf are simply implemented with ruby hashes, taken as tuples implementations.
161
+ Not all hashes are valid tuple implementations, of course (those containing nil
162
+ are not, for example). Alf _assumes_ valid tuples, but does not _enforce_ this
163
+ precondition. It's up to you to use Alf the right way! No support is or will
164
+ ever be provided for ordering tuple attributes. Howeber, as hashes are ordered
165
+ in Ruby 1.9, Alf implements a best effort strategy to keep a friendly ordering
166
+ when rendering tuples and relations. This is a very good practical reason for
167
+ migrating to ruby 1.9 if not already done!
168
+
169
+ {:sid => "S1", :name => "Smith", :status => 20, :city => "London"}
170
+
171
+ * A _relation_ is a set of tuples. Being a set, a relation does **never contain
172
+ duplicates** (unlike SQL that works on bags, not on sets) and is **not
173
+ particularly ordered**. Moreover, all tuples of a relation must have the same
174
+ _heading_, that is, the same set of attribute (name, type) pairs. Also, **a
175
+ relation is a _value_, is therefore immutable** and **does not admit null/nil**.
176
+ Alf being mainly an implementation of relational algebra (see section below)
177
+ it loosely considers any Iterator of tuples as a potentially valid relation
178
+ implementation (see later).
179
+
180
+ ### Relational Algebra
181
+
182
+ In classical algebra, you can do computations like <code>(5 + 2) - 3</code>. In
183
+ relational algebra, you can do similar things on relations. Alf uses an infix,
184
+ functional programming-oriented syntax for algebra expressions:
185
+
186
+ (minus (union :suppliers, xxx), yyy)
187
+
188
+ All relational operators take relation operands in input and return a relation
189
+ as output. We say that the relational algebra is _closed_ under its operators.
190
+ In practice, it means that operands may always be sub-expressions, **always**.
191
+
192
+ (minus (union (restrict :suppliers, lambda{ zzz }), xxx), yyy)
193
+
194
+ In shell, the closure property means that you can pipe alf invocations the way
195
+ you want! The same query, in shell:
196
+
197
+ alf restrict suppliers -- "zzz" | alf union xxx | alf minus yyy
198
+
199
+ ## What is Alf exactly?
200
+
201
+ The Third Manifesto defines a series of prescriptions, proscriptions and very
202
+ strong suggestions for designing a truly relational _language_, called a _D_,
203
+ as an alternative to SQL for managing relational databases. This is far behind
204
+ our objective with Alf, as we don't look at database aspects at all (persistence,
205
+ transactions, and so on.) and don't actually define a programming language either
206
+ (only a small functional ruby DSL).
207
+
208
+ Alf must simply be interpreted as a ruby library implementing (a variant of)
209
+ Date's and Darwen relational algebra. This library is designed as a set of operator
210
+ implementations, that work as tuple iterators taking other tuple iterators as
211
+ input. Under the pre-condition that you provide them _valid_ tuple iterators as
212
+ input (no duplicates, no nil, + other preconditions on an operator basis), the
213
+ result is a valid iterator as well. Unless explicitely stated otherwise, any
214
+ behavior observed when not respecting these preconditions, even an interesting
215
+ behavior, is not guaranteed and can change with tiny version changes (see section
216
+ about versioning policy at the end of this file).
217
+
218
+ ### In ruby
219
+
220
+ #
221
+ # Provided that :suppliers and :cities are valid relation representations
222
+ # (under the responsibility shared by you and the Reader and Environment
223
+ # subclasses you use -- see later), then,
224
+ #
225
+ op = Alf.lispy.compile{
226
+ (join (restrict :suppliers, lambda{ city == 'London' }), :cities)
227
+ }
228
+
229
+ # op is a thread-safe Enumerable of tuples, that can be taken as a valid
230
+ # relation representation. It can therefore be used as the input operand
231
+ # of any other expression. This is under Alf's responsibility, and any
232
+ # failure must be considered a bug!
233
+
234
+ ### In shell
235
+
236
+ #
237
+ # Provided that suppliers and cities are valid relation representations
238
+ # [something similar]
239
+ #
240
+ % alf restrict suppliers -- "city == 'London'" | alf join cities
241
+
242
+ # the resulting stream is a valid relation representation in the output
243
+ # stream format that you have selected (.rash by default). It can therefore
244
+ # be piped to another alf shell invocation, or saved to a file and re-read
245
+ # later (under the assumption that input and output data formats match, or
246
+ # course). [Something similar about responsibility and bug].
247
+
248
+ ### Coping with non-relational data sources (nil, duplicates, etc.)
249
+
250
+ Alf aims at being a tool that helps you tackling practical problems, and
251
+ denormalized and/or noisy data is one of them. Missing values occur. Duplicates
252
+ abound in SQL databases lacking primary keys, and so on. Using Alf's relational
253
+ operators on such inputs is not a good idea, because it is a strong precondition
254
+ violation. This is not because relational theory is weak, but because extending
255
+ it to handle null/nil and duplicates correctly has been proven at best a nightmare,
256
+ and at worse a mess. As a practical exercice, try to extend classical algebra
257
+ with versions of +, - * and / that handle nil in such a way that the resulting
258
+ theory is sound and still looks intuitive! Then do it on boolean algebra with
259
+ _and_, _or_ and _not_. Then, add null/nil to classical set theory. Classical
260
+ algebra, boolean algebra, and set theory are important building blocks behind
261
+ relational algebra because almost all of its operators are defined on top of
262
+ them...
263
+
264
+ So what? The approach choosen in Alf to handle this conflict is very pragmatic.
265
+ First of all, Alf implements a best effort strategy -- where possible -- to
266
+ remain friendly in presence of null/nil on attributes that have no influence on
267
+ an operator's job. For example, the query below will certainly fail if _status_
268
+ is null/nil, but it won't probably fail if any other attribute is nil.
269
+
270
+ % alf restrict suppliers -- "status > 10"
271
+
272
+ This best-effort strategy is not enough, and striclty speaking, must be considered
273
+ unsound (for example, it strongly hurts optimization possibilities). Therefore,
274
+ we strongly encourage you to go a step further: **if relational operators want
275
+ true relations as input, please, give them!**. For this, Alf also provides a few
276
+ non-relational operators in addition to relational ones. Those operators must be
277
+ interpreted as "pre-relational" operators, in the sense that they help obtaining
278
+ valid relation representations from invalid ones. Provided that you use them
279
+ correctly, their output can safely be used as input of a relational operator.
280
+ You'll find,
281
+
282
+ * <code>alf autonum</code> -- ensure no duplicates by generating a unique attribute
283
+ * <code>alf compact</code> -- brute-force duplicates removal
284
+ * <code>alf defaults</code> -- replace nulls/nil by valid values, on an attribute
285
+ basis
286
+
287
+ Play the game, it's easy!
288
+
289
+ - _Give id, name and status of suppliers whose status is greater that 10_
290
+ - Hey man, we don't know supplier's status for all of them! What about the others?
291
+ - _Ignore them_
292
+ - No problem dude!
293
+
294
+ % alf defaults --strict suppliers -- sid '' name '' status 0 | alf restrict -- "status > 10"
295
+
296
+ ### Alf is duck-typed
297
+
298
+ The relational theory is often considered under a statically-typed point
299
+ of view. When considering tuples and relations, for example, the notion of
300
+ _heading_, a set of (name,type) pairs, is central. For example, a heading for
301
+ a supplier tuple/relation could be:
302
+
303
+ {:sid => String, :name => Name, :status => Integer, :city => String}
304
+
305
+ Most relational operators have preconditions in terms of the headings of their
306
+ operands. For example, _minus_ and _union_ require their operands to have same
307
+ heading, while _rename_ requires renamed attributes to exist in operand's
308
+ heading, and so on. Given an expression in relational algebra, it is always
309
+ possible to compute the heading of the resulting relation, by statically
310
+ analyzing the whole query expression in the light of a catalog of typed
311
+ operators. This way, a tool can check that a query is statically valid, i.e.
312
+ that it respects operator preconditions. While this approach has the major
313
+ advantage of allowing strong optimizations, it also has a few drawbacks (as
314
+ knowing the heading of used datasources in advance) and is difficult to mary
315
+ with dynamically-typed languages like Ruby. Therefore, Alf takes another approach,
316
+ which is similar to duck-typing. In essence, this approach can be summarized as
317
+ follows:
318
+
319
+ - _You have the responsibility of ensuring that the evaluation of your query
320
+ will succeed and will return valid results_
321
+ - No problem dude!
322
+
323
+ ## Getting started in shell
324
+
325
+ % alf --help
326
+
327
+ The help command will display the list of available operators. Each of them is
328
+ completely described with 'alf help OPERATOR'. They all have a similar invocation
329
+ syntax in shell:
330
+
331
+ % alf operator operands... -- args...
332
+
333
+ For example, try the following:
334
+
335
+ # display suppliers that live in Paris
336
+ % alf restrict suppliers -- "city == 'Paris'"
337
+
338
+ # join suppliers and cities (no args here)
339
+ % alf join suppliers cities
340
+
341
+ ### Recognized data streams/files (.rash files)
342
+
343
+ For educational purposes, 'suppliers' and 'cities' inputs are magically resolved
344
+ as denoting the files examples/suppliers.rash and examples/cities.rash,
345
+ respectively. You'll find other data files: parts.rash, supplies.rash that are
346
+ resolved magically as well and with which you can play. For non-educational
347
+ purposes, operands may always be explicit files, or you can force the folder in
348
+ which datasource files have to be found:
349
+
350
+ # The following invocations are equivalent
351
+ % alf restrict /tmp/foo.rash -- "..."
352
+ % alf --env=/tmp restrict foo -- "..."
353
+
354
+ A .rash file is simply a file in which each line is a ruby Hash, intended to
355
+ represent a tuple. Under theory-driven preconditions, a .rash file can be seen
356
+ as a valid (straightforward but useful) physical representation of a relation!
357
+ When used in shell, alf dumps query results in the .rash format by default,
358
+ which opens the ability of piping invocations! Indeed, unary operators read their
359
+ operand on standard input if not specific as command argument. For example, the
360
+ invocation below is equivalent to the one given above.
361
+
362
+ # display suppliers that live in Paris
363
+ % cat examples/suppliers.rash | alf restrict -- "city == 'Paris'"
364
+
365
+ Similarly, when only one operand is present in invocations of binary operators,
366
+ they read their left operand from standard input. Therefore, the join given in
367
+ previous section can also be written as follows:
368
+
369
+ % cat examples/suppliers.rash | alf join cities
370
+
371
+ The relational algebra is _closed_ under its operators, which means that these
372
+ operators take relations as operands and return a relation. Therefore operator
373
+ invocations can be nested, that is, operands can be other relational expressions.
374
+ When you use alf in a shell, it simply means that you can pipe operators as you
375
+ want:
376
+
377
+ % alf show --rash suppliers | alf join cities | alf restrict -- "status > 10"
378
+
379
+ ### Obtaining a friendly output
380
+
381
+ The show command (which is **not** a relational operator) can be used to obtain
382
+ a more friendly output:
383
+
384
+ # it renders a text table by default
385
+ % alf show [--text] suppliers
386
+
387
+ +------+-------+---------+--------+
388
+ | :sid | :name | :status | :city |
389
+ +------+-------+---------+--------+
390
+ | S1 | Smith | 20 | London |
391
+ | S2 | Jones | 10 | Paris |
392
+ | S3 | Blake | 30 | Paris |
393
+ | S4 | Clark | 20 | London |
394
+ | S5 | Adams | 30 | Athens |
395
+ +------+-------+---------+--------+
396
+
397
+ # and reads from standard input without argument!
398
+ % alf restrict suppliers "city == 'Paris'" | alf show
399
+
400
+ +------+-------+---------+-------+
401
+ | :sid | :name | :status | :city |
402
+ +------+-------+---------+-------+
403
+ | S2 | Jones | 10 | Paris |
404
+ | S3 | Blake | 30 | Paris |
405
+ +------+-------+---------+-------+
406
+
407
+ Other formats can be obtained (see 'alf help show'). For example, you can generate
408
+ a .yaml file, as follows:
409
+
410
+ % alf restrict suppliers -- "city == 'Paris'" | alf show --yaml
411
+
412
+ ### Executing .alf files
413
+
414
+ You'll also find .alf files in the examples folder, that contain more complex
415
+ examples in the Ruby functional syntax (see section below).
416
+
417
+ % cat examples/group.alf
418
+ #!/usr/bin/env alf
419
+ (group :supplies, [:pid, :qty], :supplying)
420
+
421
+ You can simply execute these files with alf directly as follows:
422
+
423
+ # the following works, as well as the shortcut 'alf show group'
424
+ % alf examples/group.alf | alf show
425
+
426
+ +------+-----------------+
427
+ | :sid | :supplying |
428
+ +------+-----------------+
429
+ | S1 | +------+------+ |
430
+ | | | :pid | :qty | |
431
+ | | +------+------+ |
432
+ | | | P1 | 300 | |
433
+ | | | P2 | 200 | |
434
+ ...
435
+
436
+ Also, mimicing the ruby executable, the following invocation is also possible:
437
+
438
+ % alf -e "(restrict :suppliers, lambda{ city == 'Paris' })"
439
+
440
+ where the argument is a relational expression in Alf's Lispy dialect, which
441
+ is detailed in the next section.
442
+
443
+ ## Lispy expressions
444
+
445
+ If you take a look at .alf example files, you'll find functional ruby expressions
446
+ like the following:
447
+
448
+ % cat examples/minus.alf
449
+
450
+ # Give all suppliers, except those living in Paris
451
+ (minus :suppliers,
452
+ (restrict :suppliers, lambda{ city == 'Paris' }))
453
+
454
+ # This is a contrived example for illustrating minus, as the
455
+ # following is equivalent
456
+ (restrict :suppliers, lambda{ city != 'Paris' })
457
+
458
+ You can simply execute such expressions with the alf command line itself (the
459
+ three following invocations return the same result):
460
+
461
+ % alf examples/minus.alf | alf show
462
+ % alf show minus
463
+ % alf -e "(restrict :suppliers, lambda{ city != 'Paris' })" | alf show
464
+
465
+ Symbols are magically resolved from the environment, which is wired to the
466
+ examples by default. See the dedicated sections below to update this behavior
467
+ to your needs.
468
+
469
+ ### Algebra is closed under its operators!
470
+
471
+ Of course, from the closure property of a relational algebra (that states that
472
+ operators works on relations and return relations), you can use a sub expression
473
+ *everytime* a relational operand is expected, everytime:
474
+
475
+ # Compute the total qty supplied in each country together with the subset
476
+ # of products shipped there. Only consider suppliers that have a status
477
+ # greater than 10, however.
478
+ (summarize \
479
+ (join \
480
+ (join (restrict :suppliers, lambda{ status > 10 }),
481
+ :supplies),
482
+ :cities),
483
+ [:country],
484
+ :which => Agg::group(:pid),
485
+ :total => Agg::sum{ qty })
486
+
487
+ Of course, complex queries quickly become unreadable that way. But you can always
488
+ split complex tasks in more simple ones using _with_:
489
+
490
+ with( :kept_suppliers => (restrict :suppliers, lambda{ status > 10 }),
491
+ :with_countries => (join :kept_suppliers, :cities),
492
+ :supplying => (join :with_countries, :supplies) ) do
493
+ (summarize :supplying,
494
+ [:country],
495
+ :which => Agg::group(:pid),
496
+ :total => Agg::sum{ qty })
497
+ end
498
+
499
+ And here is the result !
500
+
501
+ +----------+----------+--------+
502
+ | :country | :which | :total |
503
+ +----------+----------+--------+
504
+ | England | +------+ | 2200 |
505
+ | | | :pid | | |
506
+ | | +------+ | |
507
+ | | | P1 | | |
508
+ | | | P2 | | |
509
+ | | | P3 | | |
510
+ | | | P4 | | |
511
+ | | | P5 | | |
512
+ | | | P6 | | |
513
+ | | +------+ | |
514
+ | France | +------+ | 200 |
515
+ | | | :pid | | |
516
+ | | +------+ | |
517
+ | | | P2 | | |
518
+ | | +------+ | |
519
+ +----------+----------+--------+
520
+
521
+
522
+ ### Going further
523
+
524
+ For now, the Ruby API is documented in the commandline help itself (a cheatsheet
525
+ or something will be provided as soon as possible). For example, you'll find the
526
+ allowed syntaxes for RESTRICT as follows:
527
+
528
+ % alf help restrict
529
+
530
+ ...
531
+ API & EXAMPLE
532
+
533
+ # Restrict to suppliers with status greater than 20
534
+ (restrict :suppliers, lambda{ status > 20 })
535
+
536
+ # Restrict to suppliers that live in London
537
+ (restrict :suppliers, lambda{ city == 'London' })
538
+ ...
539
+
540
+ ## Interfacing Alf in Ruby
541
+
542
+ ### Calling commands 'ala' shell
543
+
544
+ For simple cases, the easiest way of using Alf in ruby is probably to mimic
545
+ what you have in shell:
546
+
547
+ % alf restrict suppliers -- "city == 'Paris'"
548
+
549
+ Then, in ruby
550
+
551
+ #
552
+ # 1. create an engine on an environment (see section about environments later)
553
+ # 2. run a command
554
+ # 3. op is a thread-safe enumerable of tuples, see the Lispy section below)
555
+ #
556
+ lispy = Alf.lispy(Alf::Environment.examples)
557
+ op = lispy.run(['restrict', 'suppliers', '--', "city == 'Paris'"])
558
+
559
+ If this kind of API is not sufficiently expressive for you, you'll have to learn
560
+ the APIs deeper, and use the Lispy functional style that Alf provides, which can
561
+ be compiled and used as explained in the next section.
562
+
563
+ ### Compiling lispy expressions
564
+
565
+ If you want to use Alf in ruby directly (that is, not in shell or by executing
566
+ .alf files), you can simply compile expressions and use resulting operators as
567
+ follows:
568
+
569
+ #
570
+ # Expressions can simply be compiled as illustrated below. We use the
571
+ # examples environment here, see the dedicated section later about other
572
+ # available environments.
573
+ #
574
+ lispy = Alf.lispy(Alf::Environment.examples)
575
+ op = lispy.compile do
576
+ (restrict :suppliers, lambda{ city == 'London' })
577
+ end
578
+
579
+ #
580
+ # Returned _op_ is an enumerable of ruby hashes. Provided that datasets
581
+ # offered by the environment (:suppliers here) can be enumerated more than
582
+ # once, the operator may be used multiple times and is even thread safe!
583
+ #
584
+ op.each do |tuple|
585
+ # tuple is a ruby Hash
586
+ end
587
+
588
+ #
589
+ # Now, maybe you want to reuse op in a larger query, for example
590
+ # by projecting on the city attribute... Here is how with expressions
591
+ # can be handled in that case
592
+ #
593
+ projection = lispy.with(:kept_suppliers => op) do
594
+ (project :kept_suppliers, [:city])
595
+ end
596
+
597
+ ## Going further
598
+
599
+ ### Using/Implementing other Environments
600
+
601
+ An Environment instance if passed as first argument of <code>Alf.lispy</code>
602
+ and is responsible of resolving named datasets. A base class Environment::Folder
603
+ is provided with the Alf distribution, with a factory method on the Environment
604
+ class itself.
605
+
606
+ env = Alf::Environment.folder("path/to/a/folder")
607
+
608
+ An environment built that way will look for .rash and .alf files in the specified
609
+ folder and sub-folders. I'll of course strongly consider any contribution
610
+ implementing the Environment contract on top of SQL or NoSQL databases or anything
611
+ that can be useful to manipulate with relational algebra. Such contributions can
612
+ be added to the project directly, in the lib/alf/environment folder, for example.
613
+ A base template would look like:
614
+
615
+ class Foo < Alf::Environment
616
+
617
+ #
618
+ # You should at least implement the _dataset_ method that resolves a
619
+ # name (a Symbol instance) to an Enumerable of tuples (typically a
620
+ # Reader). See Alf::Environment for exact contract details.
621
+ #
622
+ def dataset(name)
623
+ end
624
+
625
+ end
626
+
627
+ ### Adding file decoders, aka Readers
628
+
629
+ Environments should not be confused with Readers (see Reader class and its
630
+ subclasses). While the former resolve named datasets, the latter decode files
631
+ and/or other resources as tuple enumerables. Environments typically serve Reader
632
+ instances in response to dataset resolving.
633
+
634
+ Reader implementations decoding .rash and .alf files are provided in the main
635
+ alf.rb file. It's relatively easy to implement the Reader contract by extending
636
+ the Reader class and implementing an each method. Once again, contributions are
637
+ very welcome in lib/alf/reader (.csv files, .log files, and so on). A basic
638
+ template for this is as follows:
639
+
640
+ class Bar < Alf::Reader
641
+
642
+ #
643
+ # You should at least implement each, see Alf::Reader which provides a
644
+ # base implementation and a few tools
645
+ #
646
+ def each
647
+ # [...]
648
+ end
649
+
650
+ # By registering it, the Folder environment will automatically
651
+ # recognize and decode .bar files correctly!
652
+ Alf::Reader.register(:bar, [".bar"], self)
653
+
654
+ end
655
+
656
+ ### Adding outputters, aka Renderers
657
+
658
+ Similarly, you can contribute renderers to output relations in html, or whatever
659
+ format you would consider interesting. See the Renderer class, and consider the
660
+ following template for contributions in lib/alf/renderer
661
+
662
+ class Glim < Alf::Renderer
663
+
664
+ #
665
+ # You should at least implement the execute method that renders tuples
666
+ # given in _input_ (an Enumerable of tuples) on the output buffer
667
+ # and returns the latter. See Alf::Renderer for the exact contract
668
+ # details.
669
+ #
670
+ def execute(output = $stdout)
671
+ # [...]
672
+ output
673
+ end
674
+
675
+
676
+ # By registering it, the output options of 'alf show' will
677
+ # automatically provide your --glim contribution
678
+ Alf::Renderer.register(:glim, "as a .glim file", self)
679
+
680
+ end
681
+
682
+ ## Related Work & Tools
683
+
684
+ - You should certainly have a look at the Third Manifesto website: http://www.thethirdmanifesto.com/
685
+ - Why not reading the {http://www.dcs.warwick.ac.uk/~hugh/TTM/DBE-Chapter01.pdf
686
+ third manifesto paper} itself?
687
+ - Also have a look at {http://www.dcs.warwick.ac.uk/~hugh/TTM/Projects.html other
688
+ implementation projects}, especially {http://dbappbuilder.sourceforge.net/Rel.php Rel}
689
+ which provides an implementation of the TUTORIAL D language.
690
+ - {https://github.com/dkubb/veritas Dan Kubb's Veritas} project is worth considering
691
+ also in the Ruby community. While very similar to Alf in providing a pure ruby
692
+ algebra implementation, Veritas mostly provides a framework for manipulating
693
+ and statically analyzing algebra expressions so as to be able to
694
+ {https://github.com/dkubb/veritas-optimizer optimize them} and
695
+ {https://github.com/dkubb/veritas-sql-generator compile them to SQL}. We are
696
+ working together with Dan Kubb to see how Alf and Veritas could be closer from
697
+ each other in the future, if not in their codebase, at least in using the very
698
+ same terminology for the same concepts.
699
+
700
+ ## Contributing
701
+
702
+ ### Alf is open source
703
+
704
+ You know the rules:
705
+
706
+ * The code is on github https://github.com/blambeau/alf
707
+ * Please report any problem or bug in the issue tracker on github
708
+ * Don't hesitate to fork and send me a pull request for any contribution/idea!
709
+
710
+ Alf is distributed under a MIT licence. Please let me know if it does not fit
711
+ your needs and I'll see what I can do!
712
+
713
+ ### Internals -- Tribute to Sinatra
714
+
715
+ Alf's code style is very inspired from what I've found in Sinatra when looking
716
+ at its internals a few month ago. Alf, as Sinatra, is mostly implemented in a
717
+ single file, lib/alf.rb. Everything is there except additional contributions
718
+ (in lib/alf/...). You'll need an editor or IDE that supports code folding/unfolding.
719
+ Then, follow the guide:
720
+
721
+ 1. Fold everything but the Alf module.
722
+ 2. Main concepts, first level of abstraction, should fit on the screen
723
+ 3. Unfold the concept you're interested in, and return to the previous bullet
724
+
725
+ ### Roadmap
726
+
727
+ Below is what I've imagined about Alf's future. However, this is to be interpreted
728
+ as my own wish list, while I would love hearing yours instead.
729
+
730
+ - Towards 1.0.0, I would like to stabilize and document Alf public APIs as well
731
+ as internals (a few concepts are still unstable there). Alf also has a certain
732
+ number of limitations that are worth overcoming for version 1.0.0. The latter
733
+ include the semantically wrong way of applying joins on sub-relations, the
734
+ impossibility to use Lispy expressions on sub-relations in extend, and the error
735
+ management which is unspecific and unfriendly so far.
736
+ - I also would like starting collecting Reader, Renderer and Environment
737
+ contributions for common data sources (SQL, NoSQL, CSV, LOGS) and output
738
+ formats (HTML, XML, JSON). Contributions could be either developped as different
739
+ gem projects or distributed with Alf's gem and source code, I still need to
740
+ decide the exact policy (suggestions are more than welcome here)
741
+ - Alf will remain a practical tool before everything else. In the middle term,
742
+ I would like to complete the set of available operators (relational and non-
743
+ relational ones). Some of them will be operators described in D & D books
744
+ while others will be new suggestions of mine.
745
+ - In the long term Alf should be able to avoid loading tuples in memory (under
746
+ a certain number of conditions on datasources) for almost all queries.
747
+ - Without targetting a fast tool at all, I also would like Alf to provide a basic
748
+ optimizer that would be able to push equality restrictions down and materialize
749
+ sub-expressions used more than once in with expressions.
750
+
751
+ ### Versioning policy
752
+
753
+ Alf respects {http://semver.org/ semantic versioning}, which means that it has
754
+ a X.Y.Z version number and follows a few rules:
755
+
756
+ - The public API is made of both the commandline tool as well as the Lispy
757
+ dialect and will become stable with version 1.0.0 in a near future.
758
+ - Backward compatible bug fixes will increase Z.
759
+ - New features and enhancements that do not break backward compatibility of the
760
+ public API will increase the Y number.
761
+ - Non backward compatible changes of the public API will increase the X number.
762
+
763
+ All classes and modules but the Alf module itself and the Lispy DSL are part of
764
+ the private API and may change at any time. A best-effort strategy is followed
765
+ to avoid breaking internals on tiny (Z) version increases.
766
+
767
+ ## Enjoy Alf!
768
+
769
+ - No problem dude!