alf 0.9.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (94) hide show
  1. data/CHANGELOG.md +5 -0
  2. data/Gemfile +2 -0
  3. data/Gemfile.lock +42 -0
  4. data/LICENCE.md +22 -0
  5. data/Manifest.txt +15 -0
  6. data/README.md +769 -0
  7. data/Rakefile +23 -0
  8. data/TODO.md +26 -0
  9. data/alf.gemspec +191 -0
  10. data/alf.noespec +30 -0
  11. data/bin/alf +31 -0
  12. data/examples/autonum.alf +6 -0
  13. data/examples/cities.rash +4 -0
  14. data/examples/clip.alf +3 -0
  15. data/examples/compact.alf +2 -0
  16. data/examples/database.alf +6 -0
  17. data/examples/defaults.alf +3 -0
  18. data/examples/extend.alf +3 -0
  19. data/examples/group.alf +3 -0
  20. data/examples/intersect.alf +4 -0
  21. data/examples/join.alf +2 -0
  22. data/examples/minus.alf +8 -0
  23. data/examples/nest.alf +2 -0
  24. data/examples/nulls.rash +3 -0
  25. data/examples/parts.rash +6 -0
  26. data/examples/project.alf +2 -0
  27. data/examples/quota.alf +4 -0
  28. data/examples/rename.alf +3 -0
  29. data/examples/restrict.alf +2 -0
  30. data/examples/runall.sh +26 -0
  31. data/examples/schema.yaml +28 -0
  32. data/examples/sort.alf +4 -0
  33. data/examples/summarize.alf +16 -0
  34. data/examples/suppliers.rash +5 -0
  35. data/examples/supplies.rash +12 -0
  36. data/examples/ungroup.alf +4 -0
  37. data/examples/union.alf +3 -0
  38. data/examples/unnest.alf +4 -0
  39. data/examples/with.alf +23 -0
  40. data/lib/alf.rb +2984 -0
  41. data/lib/alf/loader.rb +1 -0
  42. data/lib/alf/renderer/text.rb +153 -0
  43. data/lib/alf/renderer/yaml.rb +22 -0
  44. data/lib/alf/version.rb +14 -0
  45. data/spec/aggregator_spec.rb +62 -0
  46. data/spec/alf_spec.rb +47 -0
  47. data/spec/assumptions_spec.rb +15 -0
  48. data/spec/environment/explicit_spec.rb +15 -0
  49. data/spec/environment/folder_spec.rb +30 -0
  50. data/spec/examples_spec.rb +26 -0
  51. data/spec/lispy_spec.rb +23 -0
  52. data/spec/operator/command_methods_spec.rb +38 -0
  53. data/spec/operator/non_relational/autonum_spec.rb +61 -0
  54. data/spec/operator/non_relational/clip_spec.rb +49 -0
  55. data/spec/operator/non_relational/compact/buffer_based.rb +30 -0
  56. data/spec/operator/non_relational/compact/sort_based_spec.rb +30 -0
  57. data/spec/operator/non_relational/compact_spec.rb +38 -0
  58. data/spec/operator/non_relational/defaults_spec.rb +55 -0
  59. data/spec/operator/non_relational/sort_spec.rb +66 -0
  60. data/spec/operator/relational/extend_spec.rb +34 -0
  61. data/spec/operator/relational/group_spec.rb +54 -0
  62. data/spec/operator/relational/intersect_spec.rb +58 -0
  63. data/spec/operator/relational/join/hash_based_spec.rb +63 -0
  64. data/spec/operator/relational/minus_spec.rb +56 -0
  65. data/spec/operator/relational/nest_spec.rb +32 -0
  66. data/spec/operator/relational/project_spec.rb +65 -0
  67. data/spec/operator/relational/quota_spec.rb +44 -0
  68. data/spec/operator/relational/rename_spec.rb +32 -0
  69. data/spec/operator/relational/restrict_spec.rb +56 -0
  70. data/spec/operator/relational/summarize/sort_based_spec.rb +31 -0
  71. data/spec/operator/relational/summarize_spec.rb +41 -0
  72. data/spec/operator/relational/ungroup_spec.rb +35 -0
  73. data/spec/operator/relational/union_spec.rb +35 -0
  74. data/spec/operator/relational/unnest_spec.rb +32 -0
  75. data/spec/reader/alf_file_spec.rb +15 -0
  76. data/spec/reader/input.rb +2 -0
  77. data/spec/reader/rash_spec.rb +31 -0
  78. data/spec/reader_spec.rb +27 -0
  79. data/spec/renderer/text/cell_spec.rb +34 -0
  80. data/spec/renderer/text/row_spec.rb +30 -0
  81. data/spec/renderer/text/table_spec.rb +39 -0
  82. data/spec/renderer_spec.rb +42 -0
  83. data/spec/spec_helper.rb +26 -0
  84. data/spec/tools/ordering_key_spec.rb +81 -0
  85. data/spec/tools/projection_key_spec.rb +83 -0
  86. data/spec/tools/tools_spec.rb +25 -0
  87. data/spec/tools/tuple_handle_spec.rb +78 -0
  88. data/tasks/debug_mail.rake +78 -0
  89. data/tasks/debug_mail.txt +13 -0
  90. data/tasks/gem.rake +68 -0
  91. data/tasks/spec_test.rake +79 -0
  92. data/tasks/unit_test.rake +77 -0
  93. data/tasks/yard.rake +51 -0
  94. metadata +282 -0
@@ -0,0 +1,5 @@
1
+ # 0.9.0 / 2011.06.19
2
+
3
+ * Enhancements
4
+
5
+ * Birthday!
data/Gemfile ADDED
@@ -0,0 +1,2 @@
1
+ source 'http://rubygems.org'
2
+ gemspec :name => "alf"
@@ -0,0 +1,42 @@
1
+ PATH
2
+ remote: .
3
+ specs:
4
+ alf (0.9.0)
5
+ quickl (~> 0.2.1)
6
+
7
+ GEM
8
+ remote: http://rubygems.org/
9
+ specs:
10
+ bluecloth (2.0.11)
11
+ diff-lcs (1.1.2)
12
+ highline (1.6.2)
13
+ noe (1.3.0)
14
+ highline (~> 1.6.0)
15
+ quickl (~> 0.2.0)
16
+ wlang (~> 0.10.1)
17
+ quickl (0.2.1)
18
+ rake (0.8.7)
19
+ rspec (2.6.0)
20
+ rspec-core (~> 2.6.0)
21
+ rspec-expectations (~> 2.6.0)
22
+ rspec-mocks (~> 2.6.0)
23
+ rspec-core (2.6.4)
24
+ rspec-expectations (2.6.0)
25
+ diff-lcs (~> 1.1.2)
26
+ rspec-mocks (2.6.0)
27
+ wlang (0.10.2)
28
+ yard (0.7.2)
29
+
30
+ PLATFORMS
31
+ java
32
+ ruby
33
+
34
+ DEPENDENCIES
35
+ alf!
36
+ bluecloth (~> 2.0.9)
37
+ bundler (~> 1.0)
38
+ noe (~> 1.3.0)
39
+ rake (~> 0.8.7)
40
+ rspec (~> 2.6.0)
41
+ wlang (~> 0.10.1)
42
+ yard (~> 0.7.2)
@@ -0,0 +1,22 @@
1
+ # The MIT Licence
2
+
3
+ Copyright (c) 2011 - Bernard Lambeau
4
+
5
+ Permission is hereby granted, free of charge, to any person obtaining
6
+ a copy of this software and associated documentation files (the
7
+ "Software"), to deal in the Software without restriction, including
8
+ without limitation the rights to use, copy, modify, merge, publish,
9
+ distribute, sublicense, and/or sell copies of the Software, and to
10
+ permit persons to whom the Software is furnished to do so, subject to
11
+ the following conditions:
12
+
13
+ The above copyright notice and this permission notice shall be
14
+ included in all copies or substantial portions of the Software.
15
+
16
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
17
+ EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
18
+ MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
19
+ NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE
20
+ LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION
21
+ OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION
22
+ WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
@@ -0,0 +1,15 @@
1
+ bin/**/*
2
+ examples/**/*
3
+ lib/**/*
4
+ spec/**/*
5
+ tasks/**/*
6
+ Rakefile
7
+ alf.gemspec
8
+ alf.noespec
9
+ CHANGELOG.md
10
+ Gemfile
11
+ Gemfile.lock
12
+ LICENCE.md
13
+ Manifest.txt
14
+ README.md
15
+ TODO.md
@@ -0,0 +1,769 @@
1
+ # Alf - Classy data-manipulation dressed in a DSL (+ commandline)
2
+
3
+ % [sudo] gem install alf
4
+ % alf --help
5
+
6
+ ## Links
7
+
8
+ * {http://rubydoc.info/github/blambeau/alf/master/frames} (read this file there!)
9
+ * {http://github.com/blambeau/alf} (source code)
10
+ * {http://revision-zero.org} (author's blog)
11
+
12
+ ## Description
13
+
14
+ Alf is a commandline tool and Ruby library to manipulate data with all the power
15
+ of a truly relational algebra approach. Objectives behind Alf are manifold:
16
+
17
+ * Pragmatically, Alf aims at being a useful commandline executable for
18
+ manipulating csv files, database records, or whatever looks like a (physical
19
+ representation of a) relation. See 'alf --help' for the list of available
20
+ commands and implemented relational operators.
21
+
22
+ % alf restrict suppliers -- "city == 'London'" | alf join cities
23
+
24
+ * Alf is also a 100% Ruby relational algebra implementation shipped with a simple
25
+ to use, powerful, functional DSL for compiling and evaluating relational queries.
26
+ Alf is not limited to simple scalar values, but admit values of arbitrary
27
+ complexity (under a few requirements about their implementation, see next
28
+ section). See 'alf --help' as well as .alf files in the examples directory
29
+ for syntactic examples.
30
+
31
+ Alf.lispy.compile{
32
+ (join (restrict :suppliers, lambda{ city == 'London' }), :cities)
33
+ }
34
+
35
+ * Alf is also an educational tool, that I've written to draw people's attention
36
+ about the ill-known relational theory (and ill-represented by SQL). The tool
37
+ is largely inspired from TUTORIAL D, the tutorial language of Chris Date and
38
+ Hugh Darwen in their books, more specifically in
39
+ {http://www.thethirdmanifesto.com/ The Third Manifesto (TTM)}.
40
+ However, Alf only provides an overview of the relational _algebra_ defined
41
+ there (Alf is neither a relational _database_, nor a relational _language_).
42
+ I hope that people (especially talented developers) will be sufficiently
43
+ enticed by features shown here to open that book, read it more deeply, and
44
+ implement new stuff around Date & Darwen vision. Have a look at the result of
45
+ the following query for things that you'll never ever have in SQL (see also
46
+ 'alf help quota', 'alf help nest', 'alf help group', ...):
47
+
48
+ % alf --text summarize supplies --by=sid -- total "sum(:qty)" -- which "group(:pid)"
49
+
50
+ * Last, but not least, Alf is an attempt to help me test some research ideas and
51
+ communicate about them with people that already know (all or part) of the TTM
52
+ vision of relational theory. These people include members of the TTM mailing
53
+ list as well as other people implementing some of the TTM ideas (see
54
+ {https://github.com/dkubb/veritas Dan Kubb's Veritas project} for example). For
55
+ this reason, specific features and/or operators are mine, should be considered
56
+ 'research work in progress', and used with care because not necessarily in
57
+ conformance with the TTM.
58
+
59
+ % alf --text quota supplies --by=sid --order=qty -- pos "count()"
60
+
61
+ ## Overview of relational theory
62
+
63
+ We quickly recall relational theory in this section, as described in the TTM
64
+ book. Readers not familiar with Date and Darwen's vision of relational theory
65
+ should probably read this section, even if fluent in SQL. Others may probably
66
+ skip this section. A quick test?
67
+
68
+ > _A relation is a value, precisely a set of tuples, which are themselves values.
69
+ Therefore, a relation is immutable, not ordered, does not contain duplicates,
70
+ and does not have null/nil attributes._
71
+
72
+ Familiar? Skip. Otherwise, read on.
73
+
74
+ ### The example database
75
+
76
+ This README file shows a lot of examples built on top of the following suppliers
77
+ & parts database (almost identical to the original version in C.J. Date database
78
+ books). By default, the alf command line is wired to this embedded example. All
79
+ examples shown here should therefore work immediately, if you want to reproduce
80
+ them!
81
+
82
+ % alf show database
83
+
84
+ +-------------------------------------+-------------------------------------------------+-------------------------+------------------------+
85
+ | :suppliers | :parts | :cities | :supplies |
86
+ +-------------------------------------+-------------------------------------------------+-------------------------+------------------------+
87
+ | +------+-------+---------+--------+ | +------+-------+--------+------------+--------+ | +----------+----------+ | +------+------+------+ |
88
+ | | :sid | :name | :status | :city | | | :pid | :name | :color | :weight | :city | | | :city | :country | | | :sid | :pid | :qty | |
89
+ | +------+-------+---------+--------+ | +------+-------+--------+------------+--------+ | +----------+----------+ | +------+------+------+ |
90
+ | | S1 | Smith | 20 | London | | | P1 | Nut | Red | 12.0000000 | London | | | London | England | | | S1 | P1 | 300 | |
91
+ | | S2 | Jones | 10 | Paris | | | P2 | Bolt | Green | 17.0000000 | Paris | | | Paris | France | | | S1 | P2 | 200 | |
92
+ | | S3 | Blake | 30 | Paris | | | P3 | Screw | Blue | 17.0000000 | Oslo | | | Athens | Greece | | | S1 | P3 | 400 | |
93
+ | | S4 | Clark | 20 | London | | | P4 | Screw | Red | 14.0000000 | London | | | Brussels | Belgium | | | S1 | P4 | 200 | |
94
+ | | S5 | Adams | 30 | Athens | | | P5 | Cam | Blue | 12.0000000 | Paris | | +----------+----------+ | | S1 | P5 | 100 | |
95
+ | +------+-------+---------+--------+ | | P6 | Cog | Red | 19.0000000 | London | | | | S1 | P6 | 100 | |
96
+ | | +------+-------+--------+------------+--------+ | | | S2 | P1 | 300 | |
97
+ | | | | | S2 | P2 | 400 | |
98
+ | | | | | S3 | P2 | 200 | |
99
+ | | | | | S4 | P2 | 200 | |
100
+ | | | | | S4 | P4 | 300 | |
101
+ | | | | | S4 | P5 | 400 | |
102
+ | | | | +------+------+------+ |
103
+ +-------------------------------------+-------------------------------------------------+-------------------------+------------------------+
104
+
105
+ Many people think that relational databases are necessary 'flat', that they are
106
+ necessarily limited to simply scalar values in two dimension tables. This is
107
+ wrong; most SQL databases are indeed 'flat', but _relations_ (in the mathematical
108
+ sense of the relational theory) are not! Look, **the example above is a relation!**;
109
+ that 'contains' other relations as particular values, which, in turn, could
110
+ 'contain' relations or any other 'simple' or more 'complex' value... This is not
111
+ "flat" at all, after all :-)
112
+
113
+ ### Types and Values
114
+
115
+ To understand what is a relation exactly, one needs to remember elementary
116
+ notions of set theory and the concepts of _type_ and _value_.
117
+
118
+ * A _type_ is a finite set of values; it is non particularly ordered and, being
119
+ a set, it does never contains two values which are considered equal.
120
+
121
+ * A _value_ is **immutable** (you cannot 'change' a value, in any way), has no
122
+ localization in time and space, and is always typed (that is, it is always
123
+ accompanied by some identification of the type it belongs to).
124
+
125
+ As you can see, _type_ and _value_ are not the same concepts as _class_ and
126
+ _object_, with which you are probably familiar with. Alf considers that the
127
+ latter are _implementations_ of the former. Alf assumes _valid_ implementations
128
+ (equality and hash methods must be correct) and _valid_ usage (objects used for
129
+ representing values are kept immutable in practice). Alf _assumes_ this, but
130
+ does not _enforces_ it: it is your responsibility to use Alf in conformance with
131
+ these preconditions. That being said, if you want **arrays, colors, ranges, or
132
+ whatever in your relations**, just do it! You can even join on them, restrict on
133
+ them, summarize on them, and so on:
134
+
135
+ % alf extend suppliers -- chars "name.chars.to_a" | alf --text restrict -- "chars.last == 's'"
136
+
137
+ +------+-------+---------+--------+-----------------+
138
+ | :sid | :name | :status | :city | :chars |
139
+ +------+-------+---------+--------+-----------------+
140
+ | S2 | Jones | 10 | Paris | [J, o, n, e, s] |
141
+ | S5 | Adams | 30 | Athens | [A, d, a, m, s] |
142
+ +------+-------+---------+--------+-----------------+
143
+
144
+ A last, very important word about values. **Null/nil is not a value**. Strictly
145
+ speaking therefore, you may not use null/nil inside your data files or datasources
146
+ representing relations. That being said, Alf provides specific support for handling
147
+ them, because they appear in today's databases in practice and that Alf aims at
148
+ being a tool that helps you tackling _practical_ problems. See the section with
149
+ title "Why is Alf Exactly?" later.
150
+
151
+ ### Tuples and Relations
152
+
153
+ Tuples (aka records) and relations are values as well, which explains why you
154
+ can have them inside relations!
155
+
156
+ * Logically speaking, a tuple is a set of (attribute name, attribute value)
157
+ pairs. Moreover, it does not contain two attributes with the same name and is
158
+ **not particularly ordered**. Also, **a tuple is a _value_, and is therefore
159
+ immutable**. Last, but not least, a tuple **does not admit nulls/nils**. Tuples
160
+ in Alf are simply implemented with ruby hashes, taken as tuples implementations.
161
+ Not all hashes are valid tuple implementations, of course (those containing nil
162
+ are not, for example). Alf _assumes_ valid tuples, but does not _enforce_ this
163
+ precondition. It's up to you to use Alf the right way! No support is or will
164
+ ever be provided for ordering tuple attributes. Howeber, as hashes are ordered
165
+ in Ruby 1.9, Alf implements a best effort strategy to keep a friendly ordering
166
+ when rendering tuples and relations. This is a very good practical reason for
167
+ migrating to ruby 1.9 if not already done!
168
+
169
+ {:sid => "S1", :name => "Smith", :status => 20, :city => "London"}
170
+
171
+ * A _relation_ is a set of tuples. Being a set, a relation does **never contain
172
+ duplicates** (unlike SQL that works on bags, not on sets) and is **not
173
+ particularly ordered**. Moreover, all tuples of a relation must have the same
174
+ _heading_, that is, the same set of attribute (name, type) pairs. Also, **a
175
+ relation is a _value_, is therefore immutable** and **does not admit null/nil**.
176
+ Alf being mainly an implementation of relational algebra (see section below)
177
+ it loosely considers any Iterator of tuples as a potentially valid relation
178
+ implementation (see later).
179
+
180
+ ### Relational Algebra
181
+
182
+ In classical algebra, you can do computations like <code>(5 + 2) - 3</code>. In
183
+ relational algebra, you can do similar things on relations. Alf uses an infix,
184
+ functional programming-oriented syntax for algebra expressions:
185
+
186
+ (minus (union :suppliers, xxx), yyy)
187
+
188
+ All relational operators take relation operands in input and return a relation
189
+ as output. We say that the relational algebra is _closed_ under its operators.
190
+ In practice, it means that operands may always be sub-expressions, **always**.
191
+
192
+ (minus (union (restrict :suppliers, lambda{ zzz }), xxx), yyy)
193
+
194
+ In shell, the closure property means that you can pipe alf invocations the way
195
+ you want! The same query, in shell:
196
+
197
+ alf restrict suppliers -- "zzz" | alf union xxx | alf minus yyy
198
+
199
+ ## What is Alf exactly?
200
+
201
+ The Third Manifesto defines a series of prescriptions, proscriptions and very
202
+ strong suggestions for designing a truly relational _language_, called a _D_,
203
+ as an alternative to SQL for managing relational databases. This is far behind
204
+ our objective with Alf, as we don't look at database aspects at all (persistence,
205
+ transactions, and so on.) and don't actually define a programming language either
206
+ (only a small functional ruby DSL).
207
+
208
+ Alf must simply be interpreted as a ruby library implementing (a variant of)
209
+ Date's and Darwen relational algebra. This library is designed as a set of operator
210
+ implementations, that work as tuple iterators taking other tuple iterators as
211
+ input. Under the pre-condition that you provide them _valid_ tuple iterators as
212
+ input (no duplicates, no nil, + other preconditions on an operator basis), the
213
+ result is a valid iterator as well. Unless explicitely stated otherwise, any
214
+ behavior observed when not respecting these preconditions, even an interesting
215
+ behavior, is not guaranteed and can change with tiny version changes (see section
216
+ about versioning policy at the end of this file).
217
+
218
+ ### In ruby
219
+
220
+ #
221
+ # Provided that :suppliers and :cities are valid relation representations
222
+ # (under the responsibility shared by you and the Reader and Environment
223
+ # subclasses you use -- see later), then,
224
+ #
225
+ op = Alf.lispy.compile{
226
+ (join (restrict :suppliers, lambda{ city == 'London' }), :cities)
227
+ }
228
+
229
+ # op is a thread-safe Enumerable of tuples, that can be taken as a valid
230
+ # relation representation. It can therefore be used as the input operand
231
+ # of any other expression. This is under Alf's responsibility, and any
232
+ # failure must be considered a bug!
233
+
234
+ ### In shell
235
+
236
+ #
237
+ # Provided that suppliers and cities are valid relation representations
238
+ # [something similar]
239
+ #
240
+ % alf restrict suppliers -- "city == 'London'" | alf join cities
241
+
242
+ # the resulting stream is a valid relation representation in the output
243
+ # stream format that you have selected (.rash by default). It can therefore
244
+ # be piped to another alf shell invocation, or saved to a file and re-read
245
+ # later (under the assumption that input and output data formats match, or
246
+ # course). [Something similar about responsibility and bug].
247
+
248
+ ### Coping with non-relational data sources (nil, duplicates, etc.)
249
+
250
+ Alf aims at being a tool that helps you tackling practical problems, and
251
+ denormalized and/or noisy data is one of them. Missing values occur. Duplicates
252
+ abound in SQL databases lacking primary keys, and so on. Using Alf's relational
253
+ operators on such inputs is not a good idea, because it is a strong precondition
254
+ violation. This is not because relational theory is weak, but because extending
255
+ it to handle null/nil and duplicates correctly has been proven at best a nightmare,
256
+ and at worse a mess. As a practical exercice, try to extend classical algebra
257
+ with versions of +, - * and / that handle nil in such a way that the resulting
258
+ theory is sound and still looks intuitive! Then do it on boolean algebra with
259
+ _and_, _or_ and _not_. Then, add null/nil to classical set theory. Classical
260
+ algebra, boolean algebra, and set theory are important building blocks behind
261
+ relational algebra because almost all of its operators are defined on top of
262
+ them...
263
+
264
+ So what? The approach choosen in Alf to handle this conflict is very pragmatic.
265
+ First of all, Alf implements a best effort strategy -- where possible -- to
266
+ remain friendly in presence of null/nil on attributes that have no influence on
267
+ an operator's job. For example, the query below will certainly fail if _status_
268
+ is null/nil, but it won't probably fail if any other attribute is nil.
269
+
270
+ % alf restrict suppliers -- "status > 10"
271
+
272
+ This best-effort strategy is not enough, and striclty speaking, must be considered
273
+ unsound (for example, it strongly hurts optimization possibilities). Therefore,
274
+ we strongly encourage you to go a step further: **if relational operators want
275
+ true relations as input, please, give them!**. For this, Alf also provides a few
276
+ non-relational operators in addition to relational ones. Those operators must be
277
+ interpreted as "pre-relational" operators, in the sense that they help obtaining
278
+ valid relation representations from invalid ones. Provided that you use them
279
+ correctly, their output can safely be used as input of a relational operator.
280
+ You'll find,
281
+
282
+ * <code>alf autonum</code> -- ensure no duplicates by generating a unique attribute
283
+ * <code>alf compact</code> -- brute-force duplicates removal
284
+ * <code>alf defaults</code> -- replace nulls/nil by valid values, on an attribute
285
+ basis
286
+
287
+ Play the game, it's easy!
288
+
289
+ - _Give id, name and status of suppliers whose status is greater that 10_
290
+ - Hey man, we don't know supplier's status for all of them! What about the others?
291
+ - _Ignore them_
292
+ - No problem dude!
293
+
294
+ % alf defaults --strict suppliers -- sid '' name '' status 0 | alf restrict -- "status > 10"
295
+
296
+ ### Alf is duck-typed
297
+
298
+ The relational theory is often considered under a statically-typed point
299
+ of view. When considering tuples and relations, for example, the notion of
300
+ _heading_, a set of (name,type) pairs, is central. For example, a heading for
301
+ a supplier tuple/relation could be:
302
+
303
+ {:sid => String, :name => Name, :status => Integer, :city => String}
304
+
305
+ Most relational operators have preconditions in terms of the headings of their
306
+ operands. For example, _minus_ and _union_ require their operands to have same
307
+ heading, while _rename_ requires renamed attributes to exist in operand's
308
+ heading, and so on. Given an expression in relational algebra, it is always
309
+ possible to compute the heading of the resulting relation, by statically
310
+ analyzing the whole query expression in the light of a catalog of typed
311
+ operators. This way, a tool can check that a query is statically valid, i.e.
312
+ that it respects operator preconditions. While this approach has the major
313
+ advantage of allowing strong optimizations, it also has a few drawbacks (as
314
+ knowing the heading of used datasources in advance) and is difficult to mary
315
+ with dynamically-typed languages like Ruby. Therefore, Alf takes another approach,
316
+ which is similar to duck-typing. In essence, this approach can be summarized as
317
+ follows:
318
+
319
+ - _You have the responsibility of ensuring that the evaluation of your query
320
+ will succeed and will return valid results_
321
+ - No problem dude!
322
+
323
+ ## Getting started in shell
324
+
325
+ % alf --help
326
+
327
+ The help command will display the list of available operators. Each of them is
328
+ completely described with 'alf help OPERATOR'. They all have a similar invocation
329
+ syntax in shell:
330
+
331
+ % alf operator operands... -- args...
332
+
333
+ For example, try the following:
334
+
335
+ # display suppliers that live in Paris
336
+ % alf restrict suppliers -- "city == 'Paris'"
337
+
338
+ # join suppliers and cities (no args here)
339
+ % alf join suppliers cities
340
+
341
+ ### Recognized data streams/files (.rash files)
342
+
343
+ For educational purposes, 'suppliers' and 'cities' inputs are magically resolved
344
+ as denoting the files examples/suppliers.rash and examples/cities.rash,
345
+ respectively. You'll find other data files: parts.rash, supplies.rash that are
346
+ resolved magically as well and with which you can play. For non-educational
347
+ purposes, operands may always be explicit files, or you can force the folder in
348
+ which datasource files have to be found:
349
+
350
+ # The following invocations are equivalent
351
+ % alf restrict /tmp/foo.rash -- "..."
352
+ % alf --env=/tmp restrict foo -- "..."
353
+
354
+ A .rash file is simply a file in which each line is a ruby Hash, intended to
355
+ represent a tuple. Under theory-driven preconditions, a .rash file can be seen
356
+ as a valid (straightforward but useful) physical representation of a relation!
357
+ When used in shell, alf dumps query results in the .rash format by default,
358
+ which opens the ability of piping invocations! Indeed, unary operators read their
359
+ operand on standard input if not specific as command argument. For example, the
360
+ invocation below is equivalent to the one given above.
361
+
362
+ # display suppliers that live in Paris
363
+ % cat examples/suppliers.rash | alf restrict -- "city == 'Paris'"
364
+
365
+ Similarly, when only one operand is present in invocations of binary operators,
366
+ they read their left operand from standard input. Therefore, the join given in
367
+ previous section can also be written as follows:
368
+
369
+ % cat examples/suppliers.rash | alf join cities
370
+
371
+ The relational algebra is _closed_ under its operators, which means that these
372
+ operators take relations as operands and return a relation. Therefore operator
373
+ invocations can be nested, that is, operands can be other relational expressions.
374
+ When you use alf in a shell, it simply means that you can pipe operators as you
375
+ want:
376
+
377
+ % alf show --rash suppliers | alf join cities | alf restrict -- "status > 10"
378
+
379
+ ### Obtaining a friendly output
380
+
381
+ The show command (which is **not** a relational operator) can be used to obtain
382
+ a more friendly output:
383
+
384
+ # it renders a text table by default
385
+ % alf show [--text] suppliers
386
+
387
+ +------+-------+---------+--------+
388
+ | :sid | :name | :status | :city |
389
+ +------+-------+---------+--------+
390
+ | S1 | Smith | 20 | London |
391
+ | S2 | Jones | 10 | Paris |
392
+ | S3 | Blake | 30 | Paris |
393
+ | S4 | Clark | 20 | London |
394
+ | S5 | Adams | 30 | Athens |
395
+ +------+-------+---------+--------+
396
+
397
+ # and reads from standard input without argument!
398
+ % alf restrict suppliers "city == 'Paris'" | alf show
399
+
400
+ +------+-------+---------+-------+
401
+ | :sid | :name | :status | :city |
402
+ +------+-------+---------+-------+
403
+ | S2 | Jones | 10 | Paris |
404
+ | S3 | Blake | 30 | Paris |
405
+ +------+-------+---------+-------+
406
+
407
+ Other formats can be obtained (see 'alf help show'). For example, you can generate
408
+ a .yaml file, as follows:
409
+
410
+ % alf restrict suppliers -- "city == 'Paris'" | alf show --yaml
411
+
412
+ ### Executing .alf files
413
+
414
+ You'll also find .alf files in the examples folder, that contain more complex
415
+ examples in the Ruby functional syntax (see section below).
416
+
417
+ % cat examples/group.alf
418
+ #!/usr/bin/env alf
419
+ (group :supplies, [:pid, :qty], :supplying)
420
+
421
+ You can simply execute these files with alf directly as follows:
422
+
423
+ # the following works, as well as the shortcut 'alf show group'
424
+ % alf examples/group.alf | alf show
425
+
426
+ +------+-----------------+
427
+ | :sid | :supplying |
428
+ +------+-----------------+
429
+ | S1 | +------+------+ |
430
+ | | | :pid | :qty | |
431
+ | | +------+------+ |
432
+ | | | P1 | 300 | |
433
+ | | | P2 | 200 | |
434
+ ...
435
+
436
+ Also, mimicing the ruby executable, the following invocation is also possible:
437
+
438
+ % alf -e "(restrict :suppliers, lambda{ city == 'Paris' })"
439
+
440
+ where the argument is a relational expression in Alf's Lispy dialect, which
441
+ is detailed in the next section.
442
+
443
+ ## Lispy expressions
444
+
445
+ If you take a look at .alf example files, you'll find functional ruby expressions
446
+ like the following:
447
+
448
+ % cat examples/minus.alf
449
+
450
+ # Give all suppliers, except those living in Paris
451
+ (minus :suppliers,
452
+ (restrict :suppliers, lambda{ city == 'Paris' }))
453
+
454
+ # This is a contrived example for illustrating minus, as the
455
+ # following is equivalent
456
+ (restrict :suppliers, lambda{ city != 'Paris' })
457
+
458
+ You can simply execute such expressions with the alf command line itself (the
459
+ three following invocations return the same result):
460
+
461
+ % alf examples/minus.alf | alf show
462
+ % alf show minus
463
+ % alf -e "(restrict :suppliers, lambda{ city != 'Paris' })" | alf show
464
+
465
+ Symbols are magically resolved from the environment, which is wired to the
466
+ examples by default. See the dedicated sections below to update this behavior
467
+ to your needs.
468
+
469
+ ### Algebra is closed under its operators!
470
+
471
+ Of course, from the closure property of a relational algebra (that states that
472
+ operators works on relations and return relations), you can use a sub expression
473
+ *everytime* a relational operand is expected, everytime:
474
+
475
+ # Compute the total qty supplied in each country together with the subset
476
+ # of products shipped there. Only consider suppliers that have a status
477
+ # greater than 10, however.
478
+ (summarize \
479
+ (join \
480
+ (join (restrict :suppliers, lambda{ status > 10 }),
481
+ :supplies),
482
+ :cities),
483
+ [:country],
484
+ :which => Agg::group(:pid),
485
+ :total => Agg::sum{ qty })
486
+
487
+ Of course, complex queries quickly become unreadable that way. But you can always
488
+ split complex tasks in more simple ones using _with_:
489
+
490
+ with( :kept_suppliers => (restrict :suppliers, lambda{ status > 10 }),
491
+ :with_countries => (join :kept_suppliers, :cities),
492
+ :supplying => (join :with_countries, :supplies) ) do
493
+ (summarize :supplying,
494
+ [:country],
495
+ :which => Agg::group(:pid),
496
+ :total => Agg::sum{ qty })
497
+ end
498
+
499
+ And here is the result !
500
+
501
+ +----------+----------+--------+
502
+ | :country | :which | :total |
503
+ +----------+----------+--------+
504
+ | England | +------+ | 2200 |
505
+ | | | :pid | | |
506
+ | | +------+ | |
507
+ | | | P1 | | |
508
+ | | | P2 | | |
509
+ | | | P3 | | |
510
+ | | | P4 | | |
511
+ | | | P5 | | |
512
+ | | | P6 | | |
513
+ | | +------+ | |
514
+ | France | +------+ | 200 |
515
+ | | | :pid | | |
516
+ | | +------+ | |
517
+ | | | P2 | | |
518
+ | | +------+ | |
519
+ +----------+----------+--------+
520
+
521
+
522
+ ### Going further
523
+
524
+ For now, the Ruby API is documented in the commandline help itself (a cheatsheet
525
+ or something will be provided as soon as possible). For example, you'll find the
526
+ allowed syntaxes for RESTRICT as follows:
527
+
528
+ % alf help restrict
529
+
530
+ ...
531
+ API & EXAMPLE
532
+
533
+ # Restrict to suppliers with status greater than 20
534
+ (restrict :suppliers, lambda{ status > 20 })
535
+
536
+ # Restrict to suppliers that live in London
537
+ (restrict :suppliers, lambda{ city == 'London' })
538
+ ...
539
+
540
+ ## Interfacing Alf in Ruby
541
+
542
+ ### Calling commands 'ala' shell
543
+
544
+ For simple cases, the easiest way of using Alf in ruby is probably to mimic
545
+ what you have in shell:
546
+
547
+ % alf restrict suppliers -- "city == 'Paris'"
548
+
549
+ Then, in ruby
550
+
551
+ #
552
+ # 1. create an engine on an environment (see section about environments later)
553
+ # 2. run a command
554
+ # 3. op is a thread-safe enumerable of tuples, see the Lispy section below)
555
+ #
556
+ lispy = Alf.lispy(Alf::Environment.examples)
557
+ op = lispy.run(['restrict', 'suppliers', '--', "city == 'Paris'"])
558
+
559
+ If this kind of API is not sufficiently expressive for you, you'll have to learn
560
+ the APIs deeper, and use the Lispy functional style that Alf provides, which can
561
+ be compiled and used as explained in the next section.
562
+
563
+ ### Compiling lispy expressions
564
+
565
+ If you want to use Alf in ruby directly (that is, not in shell or by executing
566
+ .alf files), you can simply compile expressions and use resulting operators as
567
+ follows:
568
+
569
+ #
570
+ # Expressions can simply be compiled as illustrated below. We use the
571
+ # examples environment here, see the dedicated section later about other
572
+ # available environments.
573
+ #
574
+ lispy = Alf.lispy(Alf::Environment.examples)
575
+ op = lispy.compile do
576
+ (restrict :suppliers, lambda{ city == 'London' })
577
+ end
578
+
579
+ #
580
+ # Returned _op_ is an enumerable of ruby hashes. Provided that datasets
581
+ # offered by the environment (:suppliers here) can be enumerated more than
582
+ # once, the operator may be used multiple times and is even thread safe!
583
+ #
584
+ op.each do |tuple|
585
+ # tuple is a ruby Hash
586
+ end
587
+
588
+ #
589
+ # Now, maybe you want to reuse op in a larger query, for example
590
+ # by projecting on the city attribute... Here is how with expressions
591
+ # can be handled in that case
592
+ #
593
+ projection = lispy.with(:kept_suppliers => op) do
594
+ (project :kept_suppliers, [:city])
595
+ end
596
+
597
+ ## Going further
598
+
599
+ ### Using/Implementing other Environments
600
+
601
+ An Environment instance if passed as first argument of <code>Alf.lispy</code>
602
+ and is responsible of resolving named datasets. A base class Environment::Folder
603
+ is provided with the Alf distribution, with a factory method on the Environment
604
+ class itself.
605
+
606
+ env = Alf::Environment.folder("path/to/a/folder")
607
+
608
+ An environment built that way will look for .rash and .alf files in the specified
609
+ folder and sub-folders. I'll of course strongly consider any contribution
610
+ implementing the Environment contract on top of SQL or NoSQL databases or anything
611
+ that can be useful to manipulate with relational algebra. Such contributions can
612
+ be added to the project directly, in the lib/alf/environment folder, for example.
613
+ A base template would look like:
614
+
615
+ class Foo < Alf::Environment
616
+
617
+ #
618
+ # You should at least implement the _dataset_ method that resolves a
619
+ # name (a Symbol instance) to an Enumerable of tuples (typically a
620
+ # Reader). See Alf::Environment for exact contract details.
621
+ #
622
+ def dataset(name)
623
+ end
624
+
625
+ end
626
+
627
+ ### Adding file decoders, aka Readers
628
+
629
+ Environments should not be confused with Readers (see Reader class and its
630
+ subclasses). While the former resolve named datasets, the latter decode files
631
+ and/or other resources as tuple enumerables. Environments typically serve Reader
632
+ instances in response to dataset resolving.
633
+
634
+ Reader implementations decoding .rash and .alf files are provided in the main
635
+ alf.rb file. It's relatively easy to implement the Reader contract by extending
636
+ the Reader class and implementing an each method. Once again, contributions are
637
+ very welcome in lib/alf/reader (.csv files, .log files, and so on). A basic
638
+ template for this is as follows:
639
+
640
+ class Bar < Alf::Reader
641
+
642
+ #
643
+ # You should at least implement each, see Alf::Reader which provides a
644
+ # base implementation and a few tools
645
+ #
646
+ def each
647
+ # [...]
648
+ end
649
+
650
+ # By registering it, the Folder environment will automatically
651
+ # recognize and decode .bar files correctly!
652
+ Alf::Reader.register(:bar, [".bar"], self)
653
+
654
+ end
655
+
656
+ ### Adding outputters, aka Renderers
657
+
658
+ Similarly, you can contribute renderers to output relations in html, or whatever
659
+ format you would consider interesting. See the Renderer class, and consider the
660
+ following template for contributions in lib/alf/renderer
661
+
662
+ class Glim < Alf::Renderer
663
+
664
+ #
665
+ # You should at least implement the execute method that renders tuples
666
+ # given in _input_ (an Enumerable of tuples) on the output buffer
667
+ # and returns the latter. See Alf::Renderer for the exact contract
668
+ # details.
669
+ #
670
+ def execute(output = $stdout)
671
+ # [...]
672
+ output
673
+ end
674
+
675
+
676
+ # By registering it, the output options of 'alf show' will
677
+ # automatically provide your --glim contribution
678
+ Alf::Renderer.register(:glim, "as a .glim file", self)
679
+
680
+ end
681
+
682
+ ## Related Work & Tools
683
+
684
+ - You should certainly have a look at the Third Manifesto website: http://www.thethirdmanifesto.com/
685
+ - Why not reading the {http://www.dcs.warwick.ac.uk/~hugh/TTM/DBE-Chapter01.pdf
686
+ third manifesto paper} itself?
687
+ - Also have a look at {http://www.dcs.warwick.ac.uk/~hugh/TTM/Projects.html other
688
+ implementation projects}, especially {http://dbappbuilder.sourceforge.net/Rel.php Rel}
689
+ which provides an implementation of the TUTORIAL D language.
690
+ - {https://github.com/dkubb/veritas Dan Kubb's Veritas} project is worth considering
691
+ also in the Ruby community. While very similar to Alf in providing a pure ruby
692
+ algebra implementation, Veritas mostly provides a framework for manipulating
693
+ and statically analyzing algebra expressions so as to be able to
694
+ {https://github.com/dkubb/veritas-optimizer optimize them} and
695
+ {https://github.com/dkubb/veritas-sql-generator compile them to SQL}. We are
696
+ working together with Dan Kubb to see how Alf and Veritas could be closer from
697
+ each other in the future, if not in their codebase, at least in using the very
698
+ same terminology for the same concepts.
699
+
700
+ ## Contributing
701
+
702
+ ### Alf is open source
703
+
704
+ You know the rules:
705
+
706
+ * The code is on github https://github.com/blambeau/alf
707
+ * Please report any problem or bug in the issue tracker on github
708
+ * Don't hesitate to fork and send me a pull request for any contribution/idea!
709
+
710
+ Alf is distributed under a MIT licence. Please let me know if it does not fit
711
+ your needs and I'll see what I can do!
712
+
713
+ ### Internals -- Tribute to Sinatra
714
+
715
+ Alf's code style is very inspired from what I've found in Sinatra when looking
716
+ at its internals a few month ago. Alf, as Sinatra, is mostly implemented in a
717
+ single file, lib/alf.rb. Everything is there except additional contributions
718
+ (in lib/alf/...). You'll need an editor or IDE that supports code folding/unfolding.
719
+ Then, follow the guide:
720
+
721
+ 1. Fold everything but the Alf module.
722
+ 2. Main concepts, first level of abstraction, should fit on the screen
723
+ 3. Unfold the concept you're interested in, and return to the previous bullet
724
+
725
+ ### Roadmap
726
+
727
+ Below is what I've imagined about Alf's future. However, this is to be interpreted
728
+ as my own wish list, while I would love hearing yours instead.
729
+
730
+ - Towards 1.0.0, I would like to stabilize and document Alf public APIs as well
731
+ as internals (a few concepts are still unstable there). Alf also has a certain
732
+ number of limitations that are worth overcoming for version 1.0.0. The latter
733
+ include the semantically wrong way of applying joins on sub-relations, the
734
+ impossibility to use Lispy expressions on sub-relations in extend, and the error
735
+ management which is unspecific and unfriendly so far.
736
+ - I also would like starting collecting Reader, Renderer and Environment
737
+ contributions for common data sources (SQL, NoSQL, CSV, LOGS) and output
738
+ formats (HTML, XML, JSON). Contributions could be either developped as different
739
+ gem projects or distributed with Alf's gem and source code, I still need to
740
+ decide the exact policy (suggestions are more than welcome here)
741
+ - Alf will remain a practical tool before everything else. In the middle term,
742
+ I would like to complete the set of available operators (relational and non-
743
+ relational ones). Some of them will be operators described in D & D books
744
+ while others will be new suggestions of mine.
745
+ - In the long term Alf should be able to avoid loading tuples in memory (under
746
+ a certain number of conditions on datasources) for almost all queries.
747
+ - Without targetting a fast tool at all, I also would like Alf to provide a basic
748
+ optimizer that would be able to push equality restrictions down and materialize
749
+ sub-expressions used more than once in with expressions.
750
+
751
+ ### Versioning policy
752
+
753
+ Alf respects {http://semver.org/ semantic versioning}, which means that it has
754
+ a X.Y.Z version number and follows a few rules:
755
+
756
+ - The public API is made of both the commandline tool as well as the Lispy
757
+ dialect and will become stable with version 1.0.0 in a near future.
758
+ - Backward compatible bug fixes will increase Z.
759
+ - New features and enhancements that do not break backward compatibility of the
760
+ public API will increase the Y number.
761
+ - Non backward compatible changes of the public API will increase the X number.
762
+
763
+ All classes and modules but the Alf module itself and the Lispy DSL are part of
764
+ the private API and may change at any time. A best-effort strategy is followed
765
+ to avoid breaking internals on tiny (Z) version increases.
766
+
767
+ ## Enjoy Alf!
768
+
769
+ - No problem dude!