bmg 0.17.5 → 0.18.1
Sign up to get free protection for your applications and to get access to all the features.
- checksums.yaml +4 -4
- data/Gemfile +0 -3
- data/README.md +240 -57
- data/lib/bmg.rb +7 -0
- data/lib/bmg/algebra.rb +11 -0
- data/lib/bmg/algebra/shortcuts.rb +14 -0
- data/lib/bmg/operator.rb +1 -0
- data/lib/bmg/operator/allbut.rb +22 -0
- data/lib/bmg/operator/autosummarize.rb +20 -4
- data/lib/bmg/operator/autowrap.rb +8 -0
- data/lib/bmg/operator/image.rb +26 -2
- data/lib/bmg/operator/page.rb +1 -7
- data/lib/bmg/operator/transform.rb +94 -0
- data/lib/bmg/reader.rb +1 -0
- data/lib/bmg/reader/text_file.rb +56 -0
- data/lib/bmg/relation.rb +13 -2
- data/lib/bmg/relation/proxy.rb +63 -0
- data/lib/bmg/relation/spied.rb +1 -1
- data/lib/bmg/sql/relation.rb +0 -1
- data/lib/bmg/support.rb +3 -0
- data/lib/bmg/support/keys.rb +4 -0
- data/lib/bmg/support/ordering.rb +20 -0
- data/lib/bmg/support/output_preferences.rb +44 -0
- data/lib/bmg/support/tuple_algebra.rb +6 -0
- data/lib/bmg/support/tuple_transformer.rb +63 -0
- data/lib/bmg/type.rb +25 -0
- data/lib/bmg/version.rb +2 -2
- data/lib/bmg/writer/csv.rb +18 -7
- data/tasks/test.rake +9 -2
- metadata +21 -15
checksums.yaml
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
---
|
2
2
|
SHA256:
|
3
|
-
metadata.gz:
|
4
|
-
data.tar.gz:
|
3
|
+
metadata.gz: 1e47d2972990cc85ae0a3b99fa6a79caf97176e46c7d70cdbbbb35ffdb3df21a
|
4
|
+
data.tar.gz: dd7ddf0c123a9347f517da44cc923ec9664c8a1fd12b9e8f0ed08677be2dcdd4
|
5
5
|
SHA512:
|
6
|
-
metadata.gz:
|
7
|
-
data.tar.gz:
|
6
|
+
metadata.gz: 929f4c67a01d756b484e9ae27e5465173a4a1850686a173a8ab293dbe1e1ba01899c1ab21a8e2f29251378d0354287e379e7573ae551d3aef0583a0690640375
|
7
|
+
data.tar.gz: 3a92ab6c7708badf0464a94c496000eb55d233fb979bfd7128b71f3229347fa56c38be381815dbbc7eb0f741b58138f89551080e7670b3b044dfbba6571ac892
|
data/Gemfile
CHANGED
data/README.md
CHANGED
@@ -1,16 +1,30 @@
|
|
1
1
|
# Bmg, a relational algebra (Alf's successor)!
|
2
2
|
|
3
|
+
[![Build Status](https://travis-ci.com/enspirit/bmg.svg?branch=master)](https://travis-ci.com/enspirit/bmg)
|
4
|
+
|
3
5
|
Bmg is a relational algebra implemented as a ruby library. It implements the
|
4
6
|
[Relation as First-Class Citizen](http://www.try-alf.org/blog/2013-10-21-relations-as-first-class-citizen)
|
5
|
-
paradigm contributed with Alf a few years ago.
|
6
|
-
|
7
|
-
|
8
|
-
|
9
|
-
|
10
|
-
|
11
|
-
|
12
|
-
|
13
|
-
|
7
|
+
paradigm contributed with [Alf](http://www.try-alf.org/) a few years ago.
|
8
|
+
|
9
|
+
Bmg can be used to query relations in memory, from various files, SQL databases,
|
10
|
+
and any data source that can be seen as serving relations. Cross data-sources
|
11
|
+
joins are supported, as with Alf. For differences with Alf, see a section
|
12
|
+
further down this README.
|
13
|
+
|
14
|
+
## Outline
|
15
|
+
|
16
|
+
* [Example](#example)
|
17
|
+
* [Where are base relations coming from?](#where-are-base-relations-coming-from)
|
18
|
+
* [Memory relations](#memory-relations)
|
19
|
+
* [Connecting to SQL databases](#connecting-to-sql-databases)
|
20
|
+
* [Reading files (csv, excel, text)](#reading-files-csv-excel-text)
|
21
|
+
* [Your own relations](#your-own-relations)
|
22
|
+
* [List of supported operators](#supported-operators)
|
23
|
+
* [How is this different?](#how-is-this-different)
|
24
|
+
* [... from similar libraries](#-from-similar-libraries)
|
25
|
+
* [... from Alf](#-from-alf)
|
26
|
+
* [Contribute](#contribute)
|
27
|
+
* [License](#license)
|
14
28
|
|
15
29
|
## Example
|
16
30
|
|
@@ -27,7 +41,7 @@ suppliers = Bmg::Relation.new([
|
|
27
41
|
])
|
28
42
|
|
29
43
|
by_city = suppliers
|
30
|
-
.
|
44
|
+
.exclude(status: 30)
|
31
45
|
.extend(upname: ->(t){ t[:name].upcase })
|
32
46
|
.group([:sid, :name, :status], :suppliers_in)
|
33
47
|
|
@@ -35,76 +49,158 @@ puts JSON.pretty_generate(by_city)
|
|
35
49
|
# [{...},...]
|
36
50
|
```
|
37
51
|
|
38
|
-
##
|
52
|
+
## Where are base relations coming from?
|
53
|
+
|
54
|
+
Bmg sees relations as sets/enumerable of symbolized Ruby hashes. The following
|
55
|
+
sections show you how to get them in the first place, to enter Relationland.
|
56
|
+
|
57
|
+
### Memory relations
|
58
|
+
|
59
|
+
If you have an Array of Hashes -- in fact any Enumerable -- you can easily get
|
60
|
+
a Relation using either `Bmg::Relation.new` or `Bmg.in_memory`.
|
61
|
+
|
62
|
+
```ruby
|
63
|
+
# this...
|
64
|
+
r = Bmg::Relation.new [{id: 1}, {id: 2}]
|
65
|
+
|
66
|
+
# is the same as this...
|
67
|
+
r = Bmg.in_memory [{id: 1}, {id: 2}]
|
68
|
+
|
69
|
+
# entire algebra is available on `r`
|
70
|
+
```
|
71
|
+
|
72
|
+
### Connecting to SQL databases
|
39
73
|
|
40
|
-
Bmg requires `sequel >= 3.0` to connect to SQL databases.
|
74
|
+
Bmg currently requires `sequel >= 3.0` to connect to SQL databases. You also
|
75
|
+
need to require `bmg/sequel`.
|
41
76
|
|
42
77
|
```ruby
|
43
78
|
require 'sqlite3'
|
44
79
|
require 'bmg'
|
45
80
|
require 'bmg/sequel'
|
81
|
+
```
|
46
82
|
|
47
|
-
|
83
|
+
Then `Bmg.sequel` serves relations for tables of your SQL database:
|
48
84
|
|
85
|
+
```ruby
|
86
|
+
DB = Sequel.connect("sqlite://suppliers-and-parts.db")
|
49
87
|
suppliers = Bmg.sequel(:suppliers, DB)
|
88
|
+
```
|
89
|
+
|
90
|
+
The entire algebra is available on those relations. As long as you keep using
|
91
|
+
operators that can be translated to SQL, results remain SQL-able:
|
50
92
|
|
93
|
+
```ruby
|
51
94
|
big_suppliers = suppliers
|
52
|
-
.
|
95
|
+
.exclude(status: 30)
|
96
|
+
.project([:sid, :name])
|
53
97
|
|
54
98
|
puts big_suppliers.to_sql
|
55
|
-
# SELECT `t1`.`sid`, `t1`.`name
|
99
|
+
# SELECT `t1`.`sid`, `t1`.`name` FROM `suppliers` AS 't1' WHERE (`t1`.`status` != 30)
|
100
|
+
```
|
56
101
|
|
57
|
-
|
58
|
-
|
102
|
+
Operators not translatable to SQL are available too (such as `group` below).
|
103
|
+
Bmg fallbacks to memory operators for them, but remains capable of pushing some
|
104
|
+
operators down the tree as illustrated below (the restriction on `:city` is
|
105
|
+
pushed to the SQL server):
|
106
|
+
|
107
|
+
```ruby
|
108
|
+
Bmg.sequel(:suppliers, sequel_db)
|
109
|
+
.project([:sid, :name, :city])
|
110
|
+
.group([:sid, :name], :suppliers_in)
|
111
|
+
.restrict(city: ["Paris", "London"])
|
112
|
+
.debug
|
113
|
+
|
114
|
+
# (group
|
115
|
+
# (sequel SELECT `t1`.`sid`, `t1`.`name`, `t1`.`city` FROM `suppliers` AS 't1' WHERE (`t1`.`city` IN ('Paris', 'London')))
|
116
|
+
# [:sid, :name, :status]
|
117
|
+
# :suppliers_in
|
118
|
+
# {:array=>false})
|
59
119
|
```
|
60
120
|
|
61
|
-
|
121
|
+
### Reading files (csv, excel, text)
|
62
122
|
|
63
|
-
|
64
|
-
|
65
|
-
chaining relational operators is limited (yielding errors or wrong SQL
|
66
|
-
queries). Bmg **always** allows chaining operators. If it does not, it's
|
67
|
-
a bug. In other words, the following query is 100% valid:
|
123
|
+
Bmg provides simple adapters to read files and reach Relationland as soon as
|
124
|
+
possible.
|
68
125
|
|
69
|
-
|
70
|
-
.restrict(...) # aka where
|
71
|
-
.union(...)
|
72
|
-
.summarize(...) # aka group by
|
73
|
-
.restrict(...)
|
126
|
+
#### CSV files
|
74
127
|
|
75
|
-
|
76
|
-
|
77
|
-
|
128
|
+
```ruby
|
129
|
+
csv_options = { col_sep: ",", quote_char: '"' }
|
130
|
+
r = Bmg.csv("path/to/a/file.csv", csv_options)
|
131
|
+
```
|
78
132
|
|
79
|
-
|
80
|
-
|
133
|
+
Options are directly transmitted to `::CSV.new`, check ruby's standard
|
134
|
+
library.
|
81
135
|
|
82
|
-
|
83
|
-
autosummarize, etc.) and allows building 'non flat' relations.
|
136
|
+
#### Excel files
|
84
137
|
|
85
|
-
|
138
|
+
You will need to add [`roo`](https://github.com/roo-rb/roo) to your Gemfile to
|
139
|
+
read `.xls` and `.xlsx` files with Bmg.
|
86
140
|
|
87
|
-
|
88
|
-
|
141
|
+
```ruby
|
142
|
+
roo_options = { skip: 1 }
|
143
|
+
r = Bmg.excel("path/to/a/file.xls", roo_options)
|
144
|
+
```
|
89
145
|
|
90
|
-
|
91
|
-
|
92
|
-
(critical) production systems.
|
146
|
+
Options are directly transmitted to `Roo::Spreadsheet.open`, check roo's
|
147
|
+
documentation.
|
93
148
|
|
94
|
-
|
95
|
-
many more. Bmg is limited to the core algebra, main Relation abstraction
|
96
|
-
and SQL generation.
|
149
|
+
#### Text files
|
97
150
|
|
98
|
-
|
99
|
-
|
100
|
-
left_join operator, etc.). Sharp tools hurt, use them with great care.
|
151
|
+
There is also a straightforward way to read text files and convert lines to
|
152
|
+
tuples.
|
101
153
|
|
102
|
-
|
103
|
-
|
154
|
+
```ruby
|
155
|
+
r = Bmg.text_file("path/to/a/file.txt")
|
156
|
+
r.type.attrlist
|
157
|
+
# => [:line, :text]
|
158
|
+
```
|
104
159
|
|
105
|
-
|
106
|
-
|
107
|
-
|
160
|
+
Without options tuples will have `:line` and `:text` attributes, the former
|
161
|
+
being the line number (starting at 1) and the latter being the line itself
|
162
|
+
(stripped).
|
163
|
+
|
164
|
+
The are a couple of options (see `Bmg::Reader::Textfile`). The most useful one
|
165
|
+
is the use a of a Regexp with named captures to automatically extract
|
166
|
+
attributes:
|
167
|
+
|
168
|
+
```ruby
|
169
|
+
r = Bmg.text_file("path/to/a/file.txt", parse: /GET (?<url>([^\s]+))/)
|
170
|
+
r.type.attrlist
|
171
|
+
# => [:line, :url]
|
172
|
+
```
|
173
|
+
|
174
|
+
In this scenario, non matching lines are skipped. The `:line` attribute keeps
|
175
|
+
being used to have at least one candidate key (so to speak).
|
176
|
+
|
177
|
+
### Your own relations
|
178
|
+
|
179
|
+
As noted earlier, Bmg has a simple relation interface where you only have to
|
180
|
+
provide an iteration of symbolized tuples.
|
181
|
+
|
182
|
+
```ruby
|
183
|
+
class MyRelation
|
184
|
+
include Bmg::Relation
|
185
|
+
|
186
|
+
def each
|
187
|
+
yield(id: 1, name: "Alf", year: 2014)
|
188
|
+
yield(id: 2, name: "Bmg", year: 2018)
|
189
|
+
end
|
190
|
+
end
|
191
|
+
|
192
|
+
MyRelation.new
|
193
|
+
.restrict(Predicate.gt(:year, 2015))
|
194
|
+
.allbut([:year])
|
195
|
+
```
|
196
|
+
|
197
|
+
As shown, creating adapters on top of various data source is straighforward.
|
198
|
+
Adapters can also participate to query optimization (such as pushing
|
199
|
+
restrictions down the tree) by overriding the underscored version of operators
|
200
|
+
(e.g. `_restrict`).
|
201
|
+
|
202
|
+
Have a look at `Bmg::Algebra` for the protocol and `Bmg::Sql::Relation` for an
|
203
|
+
example. Keep in touch with the team if you need some help.
|
108
204
|
|
109
205
|
## Supported operators
|
110
206
|
|
@@ -114,8 +210,10 @@ r.autowrap(split: '_') # structure a flat relation, split:
|
|
114
210
|
r.autosummarize([:a, :b, ...], x: :sum) # (experimental) usual summarizers supported
|
115
211
|
r.constants(x: 12, ...) # add constant attributes (sometimes useful in unions)
|
116
212
|
r.extend(x: ->(t){ ... }, ...) # add computed attributes
|
213
|
+
r.exclude(predicate) # shortcut for restrict(!predicate)
|
117
214
|
r.group([:a, :b, ...], :x) # relation-valued attribute from attributes
|
118
215
|
r.image(right, :x, [:a, :b, ...]) # relation-valued attribute from another relation
|
216
|
+
r.images({:x => r1, :y => r2}, [:a, ...]) # shortcut over image(r1, :x, ...).image(r2, :y, ...)
|
119
217
|
r.join(right, [:a, :b, ...]) # natural join on a join key
|
120
218
|
r.join(right, :a => :x, :b => :y, ...) # natural join after right reversed renaming
|
121
219
|
r.left_join(right, [:a, :b, ...], {...}) # left join with optional default right tuple
|
@@ -132,15 +230,100 @@ r.restrict(a: "foo", b: "bar", ...) # relational restriction, aka where
|
|
132
230
|
r.rxmatch([:a, :b, ...], /xxx/) # regex match kind of restriction
|
133
231
|
r.summarize([:a, :b, ...], x: :sum) # relational summarization
|
134
232
|
r.suffix(:_foo, but: [:a, ...]) # suffix kind of renaming
|
233
|
+
t.transform(:to_s) # all-attrs transformation
|
234
|
+
t.transform(&:to_s) # similar, but Proc-driven
|
235
|
+
t.transform(:foo => :upcase, ...) # specific-attrs tranformation
|
236
|
+
t.transform([:to_s, :upcase]) # chain-transformation
|
135
237
|
r.union(right) # relational union
|
238
|
+
r.where(predicate) # alias for restrict(predicate)
|
136
239
|
```
|
137
240
|
|
138
|
-
##
|
241
|
+
## How is this different?
|
242
|
+
|
243
|
+
### ... from similar libraries?
|
244
|
+
|
245
|
+
1. The libraries you probably know (Sequel, Arel, SQLAlchemy, Korma, jOOQ,
|
246
|
+
etc.) do not implement a genuine relational algebra. Their support for
|
247
|
+
chaining relational operators is thus limited (restricting your expression
|
248
|
+
power and/or raising errors and/or outputting wrong or counterintuitive
|
249
|
+
SQL code). Bmg **always** allows chaining operators. If it does not, it's
|
250
|
+
a bug.
|
251
|
+
|
252
|
+
For instance the expression below is 100% valid in Bmg. The last where
|
253
|
+
clause applies to the result of the summarize (while SQL requires a `HAVING`
|
254
|
+
clause, or a `SELECT ... FROM (SELECT ...) r`).
|
255
|
+
|
256
|
+
```ruby
|
257
|
+
relation
|
258
|
+
.where(...)
|
259
|
+
.union(...)
|
260
|
+
.summarize(...) # aka group by
|
261
|
+
.where(...)
|
262
|
+
```
|
263
|
+
|
264
|
+
2. Bmg supports in memory relations, json relations, csv relations, SQL
|
265
|
+
relations and so on. It's not tight to SQL generation, and supports
|
266
|
+
queries accross multiple data sources.
|
267
|
+
|
268
|
+
3. Bmg makes a best effort to optimize queries, simplifying both generated
|
269
|
+
SQL code (low-level accesses to datasources) and in-memory operations.
|
270
|
+
|
271
|
+
4. Bmg supports various *structuring* operators (group, image, autowrap,
|
272
|
+
autosummarize, etc.) and allows building 'non flat' relations.
|
273
|
+
|
274
|
+
5. Bmg can use full ruby power when that helps (e.g. regular expressions in
|
275
|
+
WHERE clauses or ruby code in EXTEND clauses). This may prevent Bmg from
|
276
|
+
delegating work to underlying data sources (e.g. SQL server) and should
|
277
|
+
therefore be used with care though.
|
278
|
+
|
279
|
+
### ... from Alf?
|
280
|
+
|
281
|
+
If you use Alf (or used it in the past), below are the main differences between
|
282
|
+
Bmg and Alf. Bmg has NOT been written to be API-compatible with Alf and will
|
283
|
+
probably never be.
|
284
|
+
|
285
|
+
1. Bmg's implementation is much simpler than Alf and uses no ruby core
|
286
|
+
extention.
|
287
|
+
|
288
|
+
2. We are confident using Bmg in production. Systematic inspection of query
|
289
|
+
plans is advised though. Alf was a bit too experimental to be used on
|
290
|
+
(critical) production systems.
|
291
|
+
|
292
|
+
3. Alf exposes a functional syntax, command line tool, restful tools and
|
293
|
+
many more. Bmg is limited to the core algebra, main Relation abstraction
|
294
|
+
and SQL generation.
|
139
295
|
|
140
|
-
|
296
|
+
4. Bmg is less strict regarding conformance to relational theory, and
|
297
|
+
may actually expose non relational features (such as support for null,
|
298
|
+
left_join operator, etc.). Sharp tools hurt, use them with care.
|
299
|
+
|
300
|
+
5. Unlike Alf::Relation instances of Bmg::Relation capture query-trees, not
|
301
|
+
values. Currently two instances `r1` and `r2` are not equal even if they
|
302
|
+
define the same mathematical relation. As a consequence joining on
|
303
|
+
relation-valued attributes does not work as expected in Bmg until further
|
304
|
+
notice.
|
305
|
+
|
306
|
+
6. Bmg does not implement all operators documented on try-alf.org, even if
|
307
|
+
we plan to eventually support most of them.
|
308
|
+
|
309
|
+
7. Bmg has a few additional operators that prove very useful on real
|
310
|
+
production use cases: prefix, suffix, autowrap, autosummarize, left_join,
|
311
|
+
rxmatch, etc.
|
312
|
+
|
313
|
+
8. Bmg optimizes queries and compiles them to SQL on the fly, while Alf was
|
314
|
+
building an AST internally first. Strictly speaking this makes Bmg less
|
315
|
+
powerful than Alf since optimizations cannot be turned off for now.
|
316
|
+
|
317
|
+
## Contribute
|
318
|
+
|
319
|
+
Please use github issues and pull requests for all questions, bug reports,
|
320
|
+
and contributions. Don't hesitate to get in touch with us with an early code
|
321
|
+
spike if you plan to add non trivial features.
|
322
|
+
|
323
|
+
## Licence
|
324
|
+
|
325
|
+
This software is distributed by Enspirit SRL under a MIT Licence. Please
|
326
|
+
contact Bernard Lambeau (blambeau@gmail.com) with any question.
|
141
327
|
|
142
328
|
Enspirit (https://enspirit.be) and Klaro App (https://klaro.cards) are both
|
143
329
|
actively using and contributing to the library.
|
144
|
-
|
145
|
-
Feel free to contact us for help, ideas and/or contributions. Please use github
|
146
|
-
issues and pull requests if possible if code is involved.
|
data/lib/bmg.rb
CHANGED
@@ -1,6 +1,7 @@
|
|
1
1
|
require 'path'
|
2
2
|
require 'predicate'
|
3
3
|
require 'forwardable'
|
4
|
+
require 'set'
|
4
5
|
module Bmg
|
5
6
|
|
6
7
|
def in_memory(enumerable, type = Type::ANY)
|
@@ -8,6 +9,11 @@ module Bmg
|
|
8
9
|
end
|
9
10
|
module_function :in_memory
|
10
11
|
|
12
|
+
def text_file(path, options = {}, type = Type::ANY)
|
13
|
+
Reader::TextFile.new(type, path, options).spied(main_spy)
|
14
|
+
end
|
15
|
+
module_function :text_file
|
16
|
+
|
11
17
|
def csv(path, options = {}, type = Type::ANY)
|
12
18
|
Reader::Csv.new(type, path, options).spied(main_spy)
|
13
19
|
end
|
@@ -44,6 +50,7 @@ module Bmg
|
|
44
50
|
require_relative 'bmg/relation/in_memory'
|
45
51
|
require_relative 'bmg/relation/spied'
|
46
52
|
require_relative 'bmg/relation/materialized'
|
53
|
+
require_relative 'bmg/relation/proxy'
|
47
54
|
|
48
55
|
# Deprecated
|
49
56
|
Leaf = Relation::InMemory
|
data/lib/bmg/algebra.rb
CHANGED
@@ -172,6 +172,17 @@ module Bmg
|
|
172
172
|
end
|
173
173
|
protected :_summarize
|
174
174
|
|
175
|
+
def transform(transformation = nil, options = {}, &proc)
|
176
|
+
transformation, options = proc, (transformation || {}) unless proc.nil?
|
177
|
+
return self if transformation.is_a?(Hash) && transformation.empty?
|
178
|
+
_transform(self.type.transform(transformation, options), transformation, options)
|
179
|
+
end
|
180
|
+
|
181
|
+
def _transform(type, transformation, options)
|
182
|
+
Operator::Transform.new(type, self, transformation, options)
|
183
|
+
end
|
184
|
+
protected :_transform
|
185
|
+
|
175
186
|
def union(other, options = {})
|
176
187
|
return self if other.is_a?(Relation::Empty)
|
177
188
|
_union self.type.union(other.type), other, options
|
@@ -2,6 +2,14 @@ module Bmg
|
|
2
2
|
module Algebra
|
3
3
|
module Shortcuts
|
4
4
|
|
5
|
+
def where(predicate)
|
6
|
+
restrict(predicate)
|
7
|
+
end
|
8
|
+
|
9
|
+
def exclude(predicate)
|
10
|
+
restrict(!Predicate.coerce(predicate))
|
11
|
+
end
|
12
|
+
|
5
13
|
def rxmatch(attrs, matcher, options = {})
|
6
14
|
predicate = attrs.inject(Predicate.contradiction){|p,a|
|
7
15
|
p | Predicate.match(a, matcher, options)
|
@@ -31,6 +39,12 @@ module Bmg
|
|
31
39
|
self.image(right.rename(renaming), as, on.keys, options)
|
32
40
|
end
|
33
41
|
|
42
|
+
def images(rights, on = [], options = {})
|
43
|
+
rights.each_pair.inject(self){|memo,(as,right)|
|
44
|
+
memo.image(right, as, on, options)
|
45
|
+
}
|
46
|
+
end
|
47
|
+
|
34
48
|
def join(right, on = [])
|
35
49
|
return super unless on.is_a?(Hash)
|
36
50
|
renaming = Hash[on.map{|k,v| [v,k] }]
|