bmg 0.17.5 → 0.18.1

Sign up to get free protection for your applications and to get access to all the features.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: faea9567e3ec11347ccd8d1e063d027a720095bc8623221597a40e26aade3d99
4
- data.tar.gz: 01b649d2810c6460822c06c83ce06911d438f68cc9b8d297afb766c389619544
3
+ metadata.gz: 1e47d2972990cc85ae0a3b99fa6a79caf97176e46c7d70cdbbbb35ffdb3df21a
4
+ data.tar.gz: dd7ddf0c123a9347f517da44cc923ec9664c8a1fd12b9e8f0ed08677be2dcdd4
5
5
  SHA512:
6
- metadata.gz: e75f1778fc7fd0578b37fda44c6b2ccf5f5aa4441c16efeb42b495c38bc9361495b6597d70ce0ea8829caceaceb708c247c24b8531464d6c16f8f3faef1ee01f
7
- data.tar.gz: 3762248213c27bf9e11276fe3a2b9deb75a124e80cc2f7cdf8fbe9a70856649f85fe55eb71474ae6f4a2836fd3d2d144ff6ae44b7b3810ff08bde51d813136a1
6
+ metadata.gz: 929f4c67a01d756b484e9ae27e5465173a4a1850686a173a8ab293dbe1e1ba01899c1ab21a8e2f29251378d0354287e379e7573ae551d3aef0583a0690640375
7
+ data.tar.gz: 3a92ab6c7708badf0464a94c496000eb55d233fb979bfd7128b71f3229347fa56c38be381815dbbc7eb0f741b58138f89551080e7670b3b044dfbba6571ac892
data/Gemfile CHANGED
@@ -1,5 +1,2 @@
1
1
  source "https://rubygems.org"
2
2
  gemspec
3
-
4
- # gem "predicate", github: "enspirit/predicate", branch: "placeholders"
5
- # gem "predicate", path: "../predicate"
data/README.md CHANGED
@@ -1,16 +1,30 @@
1
1
  # Bmg, a relational algebra (Alf's successor)!
2
2
 
3
+ [![Build Status](https://travis-ci.com/enspirit/bmg.svg?branch=master)](https://travis-ci.com/enspirit/bmg)
4
+
3
5
  Bmg is a relational algebra implemented as a ruby library. It implements the
4
6
  [Relation as First-Class Citizen](http://www.try-alf.org/blog/2013-10-21-relations-as-first-class-citizen)
5
- paradigm contributed with Alf a few years ago.
6
-
7
- Like Alf, Bmg can be used to query relations in memory, from various files,
8
- SQL databases, and any data sources that can be seen as serving relations.
9
- Cross data-sources joins are supported, as with Alf.
10
-
11
- Unlike Alf, Bmg does not make any core ruby extension and exposes the
12
- object-oriented syntax only (not Alf's functional one). Bmg implementation is
13
- also much simpler, and make its easier to implement user-defined relations.
7
+ paradigm contributed with [Alf](http://www.try-alf.org/) a few years ago.
8
+
9
+ Bmg can be used to query relations in memory, from various files, SQL databases,
10
+ and any data source that can be seen as serving relations. Cross data-sources
11
+ joins are supported, as with Alf. For differences with Alf, see a section
12
+ further down this README.
13
+
14
+ ## Outline
15
+
16
+ * [Example](#example)
17
+ * [Where are base relations coming from?](#where-are-base-relations-coming-from)
18
+ * [Memory relations](#memory-relations)
19
+ * [Connecting to SQL databases](#connecting-to-sql-databases)
20
+ * [Reading files (csv, excel, text)](#reading-files-csv-excel-text)
21
+ * [Your own relations](#your-own-relations)
22
+ * [List of supported operators](#supported-operators)
23
+ * [How is this different?](#how-is-this-different)
24
+ * [... from similar libraries](#-from-similar-libraries)
25
+ * [... from Alf](#-from-alf)
26
+ * [Contribute](#contribute)
27
+ * [License](#license)
14
28
 
15
29
  ## Example
16
30
 
@@ -27,7 +41,7 @@ suppliers = Bmg::Relation.new([
27
41
  ])
28
42
 
29
43
  by_city = suppliers
30
- .restrict(Predicate.neq(status: 30))
44
+ .exclude(status: 30)
31
45
  .extend(upname: ->(t){ t[:name].upcase })
32
46
  .group([:sid, :name, :status], :suppliers_in)
33
47
 
@@ -35,76 +49,158 @@ puts JSON.pretty_generate(by_city)
35
49
  # [{...},...]
36
50
  ```
37
51
 
38
- ## Connecting to a SQL database
52
+ ## Where are base relations coming from?
53
+
54
+ Bmg sees relations as sets/enumerable of symbolized Ruby hashes. The following
55
+ sections show you how to get them in the first place, to enter Relationland.
56
+
57
+ ### Memory relations
58
+
59
+ If you have an Array of Hashes -- in fact any Enumerable -- you can easily get
60
+ a Relation using either `Bmg::Relation.new` or `Bmg.in_memory`.
61
+
62
+ ```ruby
63
+ # this...
64
+ r = Bmg::Relation.new [{id: 1}, {id: 2}]
65
+
66
+ # is the same as this...
67
+ r = Bmg.in_memory [{id: 1}, {id: 2}]
68
+
69
+ # entire algebra is available on `r`
70
+ ```
71
+
72
+ ### Connecting to SQL databases
39
73
 
40
- Bmg requires `sequel >= 3.0` to connect to SQL databases.
74
+ Bmg currently requires `sequel >= 3.0` to connect to SQL databases. You also
75
+ need to require `bmg/sequel`.
41
76
 
42
77
  ```ruby
43
78
  require 'sqlite3'
44
79
  require 'bmg'
45
80
  require 'bmg/sequel'
81
+ ```
46
82
 
47
- DB = Sequel.connect("sqlite://suppliers-and-parts.db")
83
+ Then `Bmg.sequel` serves relations for tables of your SQL database:
48
84
 
85
+ ```ruby
86
+ DB = Sequel.connect("sqlite://suppliers-and-parts.db")
49
87
  suppliers = Bmg.sequel(:suppliers, DB)
88
+ ```
89
+
90
+ The entire algebra is available on those relations. As long as you keep using
91
+ operators that can be translated to SQL, results remain SQL-able:
50
92
 
93
+ ```ruby
51
94
  big_suppliers = suppliers
52
- .restrict(Predicate.neq(status: 30))
95
+ .exclude(status: 30)
96
+ .project([:sid, :name])
53
97
 
54
98
  puts big_suppliers.to_sql
55
- # SELECT `t1`.`sid`, `t1`.`name`, `t1`.`status`, `t1`.`city` FROM `suppliers` AS 't1' WHERE (`t1`.`status` != 30)
99
+ # SELECT `t1`.`sid`, `t1`.`name` FROM `suppliers` AS 't1' WHERE (`t1`.`status` != 30)
100
+ ```
56
101
 
57
- puts JSON.pretty_generate(big_suppliers)
58
- # [{...},...]
102
+ Operators not translatable to SQL are available too (such as `group` below).
103
+ Bmg fallbacks to memory operators for them, but remains capable of pushing some
104
+ operators down the tree as illustrated below (the restriction on `:city` is
105
+ pushed to the SQL server):
106
+
107
+ ```ruby
108
+ Bmg.sequel(:suppliers, sequel_db)
109
+ .project([:sid, :name, :city])
110
+ .group([:sid, :name], :suppliers_in)
111
+ .restrict(city: ["Paris", "London"])
112
+ .debug
113
+
114
+ # (group
115
+ # (sequel SELECT `t1`.`sid`, `t1`.`name`, `t1`.`city` FROM `suppliers` AS 't1' WHERE (`t1`.`city` IN ('Paris', 'London')))
116
+ # [:sid, :name, :status]
117
+ # :suppliers_in
118
+ # {:array=>false})
59
119
  ```
60
120
 
61
- ## How is this different from similar libraries?
121
+ ### Reading files (csv, excel, text)
62
122
 
63
- 1. The libraries you probably know (Sequel, Arel, SQLAlchemy, Korma, jOOQ,
64
- etc.) do not implement a genuine relational algebra: their support for
65
- chaining relational operators is limited (yielding errors or wrong SQL
66
- queries). Bmg **always** allows chaining operators. If it does not, it's
67
- a bug. In other words, the following query is 100% valid:
123
+ Bmg provides simple adapters to read files and reach Relationland as soon as
124
+ possible.
68
125
 
69
- relation
70
- .restrict(...) # aka where
71
- .union(...)
72
- .summarize(...) # aka group by
73
- .restrict(...)
126
+ #### CSV files
74
127
 
75
- 2. Bmg supports in memory relations, json relations, csv relations, SQL
76
- relations and so on. It's not tight to SQL generation, and supports
77
- queries accross multiple data sources.
128
+ ```ruby
129
+ csv_options = { col_sep: ",", quote_char: '"' }
130
+ r = Bmg.csv("path/to/a/file.csv", csv_options)
131
+ ```
78
132
 
79
- 3. Bmg makes a best effort to optimize queries, simplifying both generated
80
- SQL code (low-level accesses to datasources) and in-memory operations.
133
+ Options are directly transmitted to `::CSV.new`, check ruby's standard
134
+ library.
81
135
 
82
- 4. Bmg supports various *structuring* operators (group, image, autowrap,
83
- autosummarize, etc.) and allows building 'non flat' relations.
136
+ #### Excel files
84
137
 
85
- ## How is this different from Alf?
138
+ You will need to add [`roo`](https://github.com/roo-rb/roo) to your Gemfile to
139
+ read `.xls` and `.xlsx` files with Bmg.
86
140
 
87
- 1. Bmg's implementation is much simpler than Alf, and uses no ruby core
88
- extention.
141
+ ```ruby
142
+ roo_options = { skip: 1 }
143
+ r = Bmg.excel("path/to/a/file.xls", roo_options)
144
+ ```
89
145
 
90
- 2. We are confident using Bmg in production. Systematic inspection of query
91
- plans is suggested though. Alf was a bit too experimental to be used on
92
- (critical) production systems.
146
+ Options are directly transmitted to `Roo::Spreadsheet.open`, check roo's
147
+ documentation.
93
148
 
94
- 2. Alf exposes a functional syntax, command line tool, restful tools and
95
- many more. Bmg is limited to the core algebra, main Relation abstraction
96
- and SQL generation.
149
+ #### Text files
97
150
 
98
- 3. Bmg is less strict regarding conformance to relational theory, and
99
- may actually expose non relational features (such as support for null,
100
- left_join operator, etc.). Sharp tools hurt, use them with great care.
151
+ There is also a straightforward way to read text files and convert lines to
152
+ tuples.
101
153
 
102
- 4. Bmg does not yet implement all operators documented on try-alf.org, even
103
- if we plan to eventually support them all.
154
+ ```ruby
155
+ r = Bmg.text_file("path/to/a/file.txt")
156
+ r.type.attrlist
157
+ # => [:line, :text]
158
+ ```
104
159
 
105
- 5. Bmg has a few additional operators that prove very useful on real
106
- production use cases: prefix, suffix, autowrap, autosummarize, left_join,
107
- rxmatch, etc.
160
+ Without options tuples will have `:line` and `:text` attributes, the former
161
+ being the line number (starting at 1) and the latter being the line itself
162
+ (stripped).
163
+
164
+ The are a couple of options (see `Bmg::Reader::Textfile`). The most useful one
165
+ is the use a of a Regexp with named captures to automatically extract
166
+ attributes:
167
+
168
+ ```ruby
169
+ r = Bmg.text_file("path/to/a/file.txt", parse: /GET (?<url>([^\s]+))/)
170
+ r.type.attrlist
171
+ # => [:line, :url]
172
+ ```
173
+
174
+ In this scenario, non matching lines are skipped. The `:line` attribute keeps
175
+ being used to have at least one candidate key (so to speak).
176
+
177
+ ### Your own relations
178
+
179
+ As noted earlier, Bmg has a simple relation interface where you only have to
180
+ provide an iteration of symbolized tuples.
181
+
182
+ ```ruby
183
+ class MyRelation
184
+ include Bmg::Relation
185
+
186
+ def each
187
+ yield(id: 1, name: "Alf", year: 2014)
188
+ yield(id: 2, name: "Bmg", year: 2018)
189
+ end
190
+ end
191
+
192
+ MyRelation.new
193
+ .restrict(Predicate.gt(:year, 2015))
194
+ .allbut([:year])
195
+ ```
196
+
197
+ As shown, creating adapters on top of various data source is straighforward.
198
+ Adapters can also participate to query optimization (such as pushing
199
+ restrictions down the tree) by overriding the underscored version of operators
200
+ (e.g. `_restrict`).
201
+
202
+ Have a look at `Bmg::Algebra` for the protocol and `Bmg::Sql::Relation` for an
203
+ example. Keep in touch with the team if you need some help.
108
204
 
109
205
  ## Supported operators
110
206
 
@@ -114,8 +210,10 @@ r.autowrap(split: '_') # structure a flat relation, split:
114
210
  r.autosummarize([:a, :b, ...], x: :sum) # (experimental) usual summarizers supported
115
211
  r.constants(x: 12, ...) # add constant attributes (sometimes useful in unions)
116
212
  r.extend(x: ->(t){ ... }, ...) # add computed attributes
213
+ r.exclude(predicate) # shortcut for restrict(!predicate)
117
214
  r.group([:a, :b, ...], :x) # relation-valued attribute from attributes
118
215
  r.image(right, :x, [:a, :b, ...]) # relation-valued attribute from another relation
216
+ r.images({:x => r1, :y => r2}, [:a, ...]) # shortcut over image(r1, :x, ...).image(r2, :y, ...)
119
217
  r.join(right, [:a, :b, ...]) # natural join on a join key
120
218
  r.join(right, :a => :x, :b => :y, ...) # natural join after right reversed renaming
121
219
  r.left_join(right, [:a, :b, ...], {...}) # left join with optional default right tuple
@@ -132,15 +230,100 @@ r.restrict(a: "foo", b: "bar", ...) # relational restriction, aka where
132
230
  r.rxmatch([:a, :b, ...], /xxx/) # regex match kind of restriction
133
231
  r.summarize([:a, :b, ...], x: :sum) # relational summarization
134
232
  r.suffix(:_foo, but: [:a, ...]) # suffix kind of renaming
233
+ t.transform(:to_s) # all-attrs transformation
234
+ t.transform(&:to_s) # similar, but Proc-driven
235
+ t.transform(:foo => :upcase, ...) # specific-attrs tranformation
236
+ t.transform([:to_s, :upcase]) # chain-transformation
135
237
  r.union(right) # relational union
238
+ r.where(predicate) # alias for restrict(predicate)
136
239
  ```
137
240
 
138
- ## Who is behind Bmg?
241
+ ## How is this different?
242
+
243
+ ### ... from similar libraries?
244
+
245
+ 1. The libraries you probably know (Sequel, Arel, SQLAlchemy, Korma, jOOQ,
246
+ etc.) do not implement a genuine relational algebra. Their support for
247
+ chaining relational operators is thus limited (restricting your expression
248
+ power and/or raising errors and/or outputting wrong or counterintuitive
249
+ SQL code). Bmg **always** allows chaining operators. If it does not, it's
250
+ a bug.
251
+
252
+ For instance the expression below is 100% valid in Bmg. The last where
253
+ clause applies to the result of the summarize (while SQL requires a `HAVING`
254
+ clause, or a `SELECT ... FROM (SELECT ...) r`).
255
+
256
+ ```ruby
257
+ relation
258
+ .where(...)
259
+ .union(...)
260
+ .summarize(...) # aka group by
261
+ .where(...)
262
+ ```
263
+
264
+ 2. Bmg supports in memory relations, json relations, csv relations, SQL
265
+ relations and so on. It's not tight to SQL generation, and supports
266
+ queries accross multiple data sources.
267
+
268
+ 3. Bmg makes a best effort to optimize queries, simplifying both generated
269
+ SQL code (low-level accesses to datasources) and in-memory operations.
270
+
271
+ 4. Bmg supports various *structuring* operators (group, image, autowrap,
272
+ autosummarize, etc.) and allows building 'non flat' relations.
273
+
274
+ 5. Bmg can use full ruby power when that helps (e.g. regular expressions in
275
+ WHERE clauses or ruby code in EXTEND clauses). This may prevent Bmg from
276
+ delegating work to underlying data sources (e.g. SQL server) and should
277
+ therefore be used with care though.
278
+
279
+ ### ... from Alf?
280
+
281
+ If you use Alf (or used it in the past), below are the main differences between
282
+ Bmg and Alf. Bmg has NOT been written to be API-compatible with Alf and will
283
+ probably never be.
284
+
285
+ 1. Bmg's implementation is much simpler than Alf and uses no ruby core
286
+ extention.
287
+
288
+ 2. We are confident using Bmg in production. Systematic inspection of query
289
+ plans is advised though. Alf was a bit too experimental to be used on
290
+ (critical) production systems.
291
+
292
+ 3. Alf exposes a functional syntax, command line tool, restful tools and
293
+ many more. Bmg is limited to the core algebra, main Relation abstraction
294
+ and SQL generation.
139
295
 
140
- Bernard Lambeau (bernard@klaro.cards) is Alf & Bmg main engineer & maintainer.
296
+ 4. Bmg is less strict regarding conformance to relational theory, and
297
+ may actually expose non relational features (such as support for null,
298
+ left_join operator, etc.). Sharp tools hurt, use them with care.
299
+
300
+ 5. Unlike Alf::Relation instances of Bmg::Relation capture query-trees, not
301
+ values. Currently two instances `r1` and `r2` are not equal even if they
302
+ define the same mathematical relation. As a consequence joining on
303
+ relation-valued attributes does not work as expected in Bmg until further
304
+ notice.
305
+
306
+ 6. Bmg does not implement all operators documented on try-alf.org, even if
307
+ we plan to eventually support most of them.
308
+
309
+ 7. Bmg has a few additional operators that prove very useful on real
310
+ production use cases: prefix, suffix, autowrap, autosummarize, left_join,
311
+ rxmatch, etc.
312
+
313
+ 8. Bmg optimizes queries and compiles them to SQL on the fly, while Alf was
314
+ building an AST internally first. Strictly speaking this makes Bmg less
315
+ powerful than Alf since optimizations cannot be turned off for now.
316
+
317
+ ## Contribute
318
+
319
+ Please use github issues and pull requests for all questions, bug reports,
320
+ and contributions. Don't hesitate to get in touch with us with an early code
321
+ spike if you plan to add non trivial features.
322
+
323
+ ## Licence
324
+
325
+ This software is distributed by Enspirit SRL under a MIT Licence. Please
326
+ contact Bernard Lambeau (blambeau@gmail.com) with any question.
141
327
 
142
328
  Enspirit (https://enspirit.be) and Klaro App (https://klaro.cards) are both
143
329
  actively using and contributing to the library.
144
-
145
- Feel free to contact us for help, ideas and/or contributions. Please use github
146
- issues and pull requests if possible if code is involved.
data/lib/bmg.rb CHANGED
@@ -1,6 +1,7 @@
1
1
  require 'path'
2
2
  require 'predicate'
3
3
  require 'forwardable'
4
+ require 'set'
4
5
  module Bmg
5
6
 
6
7
  def in_memory(enumerable, type = Type::ANY)
@@ -8,6 +9,11 @@ module Bmg
8
9
  end
9
10
  module_function :in_memory
10
11
 
12
+ def text_file(path, options = {}, type = Type::ANY)
13
+ Reader::TextFile.new(type, path, options).spied(main_spy)
14
+ end
15
+ module_function :text_file
16
+
11
17
  def csv(path, options = {}, type = Type::ANY)
12
18
  Reader::Csv.new(type, path, options).spied(main_spy)
13
19
  end
@@ -44,6 +50,7 @@ module Bmg
44
50
  require_relative 'bmg/relation/in_memory'
45
51
  require_relative 'bmg/relation/spied'
46
52
  require_relative 'bmg/relation/materialized'
53
+ require_relative 'bmg/relation/proxy'
47
54
 
48
55
  # Deprecated
49
56
  Leaf = Relation::InMemory
data/lib/bmg/algebra.rb CHANGED
@@ -172,6 +172,17 @@ module Bmg
172
172
  end
173
173
  protected :_summarize
174
174
 
175
+ def transform(transformation = nil, options = {}, &proc)
176
+ transformation, options = proc, (transformation || {}) unless proc.nil?
177
+ return self if transformation.is_a?(Hash) && transformation.empty?
178
+ _transform(self.type.transform(transformation, options), transformation, options)
179
+ end
180
+
181
+ def _transform(type, transformation, options)
182
+ Operator::Transform.new(type, self, transformation, options)
183
+ end
184
+ protected :_transform
185
+
175
186
  def union(other, options = {})
176
187
  return self if other.is_a?(Relation::Empty)
177
188
  _union self.type.union(other.type), other, options
@@ -2,6 +2,14 @@ module Bmg
2
2
  module Algebra
3
3
  module Shortcuts
4
4
 
5
+ def where(predicate)
6
+ restrict(predicate)
7
+ end
8
+
9
+ def exclude(predicate)
10
+ restrict(!Predicate.coerce(predicate))
11
+ end
12
+
5
13
  def rxmatch(attrs, matcher, options = {})
6
14
  predicate = attrs.inject(Predicate.contradiction){|p,a|
7
15
  p | Predicate.match(a, matcher, options)
@@ -31,6 +39,12 @@ module Bmg
31
39
  self.image(right.rename(renaming), as, on.keys, options)
32
40
  end
33
41
 
42
+ def images(rights, on = [], options = {})
43
+ rights.each_pair.inject(self){|memo,(as,right)|
44
+ memo.image(right, as, on, options)
45
+ }
46
+ end
47
+
34
48
  def join(right, on = [])
35
49
  return super unless on.is_a?(Hash)
36
50
  renaming = Hash[on.map{|k,v| [v,k] }]