bmg 0.17.5 → 0.18.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: faea9567e3ec11347ccd8d1e063d027a720095bc8623221597a40e26aade3d99
4
- data.tar.gz: 01b649d2810c6460822c06c83ce06911d438f68cc9b8d297afb766c389619544
3
+ metadata.gz: 1e47d2972990cc85ae0a3b99fa6a79caf97176e46c7d70cdbbbb35ffdb3df21a
4
+ data.tar.gz: dd7ddf0c123a9347f517da44cc923ec9664c8a1fd12b9e8f0ed08677be2dcdd4
5
5
  SHA512:
6
- metadata.gz: e75f1778fc7fd0578b37fda44c6b2ccf5f5aa4441c16efeb42b495c38bc9361495b6597d70ce0ea8829caceaceb708c247c24b8531464d6c16f8f3faef1ee01f
7
- data.tar.gz: 3762248213c27bf9e11276fe3a2b9deb75a124e80cc2f7cdf8fbe9a70856649f85fe55eb71474ae6f4a2836fd3d2d144ff6ae44b7b3810ff08bde51d813136a1
6
+ metadata.gz: 929f4c67a01d756b484e9ae27e5465173a4a1850686a173a8ab293dbe1e1ba01899c1ab21a8e2f29251378d0354287e379e7573ae551d3aef0583a0690640375
7
+ data.tar.gz: 3a92ab6c7708badf0464a94c496000eb55d233fb979bfd7128b71f3229347fa56c38be381815dbbc7eb0f741b58138f89551080e7670b3b044dfbba6571ac892
data/Gemfile CHANGED
@@ -1,5 +1,2 @@
1
1
  source "https://rubygems.org"
2
2
  gemspec
3
-
4
- # gem "predicate", github: "enspirit/predicate", branch: "placeholders"
5
- # gem "predicate", path: "../predicate"
data/README.md CHANGED
@@ -1,16 +1,30 @@
1
1
  # Bmg, a relational algebra (Alf's successor)!
2
2
 
3
+ [![Build Status](https://travis-ci.com/enspirit/bmg.svg?branch=master)](https://travis-ci.com/enspirit/bmg)
4
+
3
5
  Bmg is a relational algebra implemented as a ruby library. It implements the
4
6
  [Relation as First-Class Citizen](http://www.try-alf.org/blog/2013-10-21-relations-as-first-class-citizen)
5
- paradigm contributed with Alf a few years ago.
6
-
7
- Like Alf, Bmg can be used to query relations in memory, from various files,
8
- SQL databases, and any data sources that can be seen as serving relations.
9
- Cross data-sources joins are supported, as with Alf.
10
-
11
- Unlike Alf, Bmg does not make any core ruby extension and exposes the
12
- object-oriented syntax only (not Alf's functional one). Bmg implementation is
13
- also much simpler, and make its easier to implement user-defined relations.
7
+ paradigm contributed with [Alf](http://www.try-alf.org/) a few years ago.
8
+
9
+ Bmg can be used to query relations in memory, from various files, SQL databases,
10
+ and any data source that can be seen as serving relations. Cross data-sources
11
+ joins are supported, as with Alf. For differences with Alf, see a section
12
+ further down this README.
13
+
14
+ ## Outline
15
+
16
+ * [Example](#example)
17
+ * [Where are base relations coming from?](#where-are-base-relations-coming-from)
18
+ * [Memory relations](#memory-relations)
19
+ * [Connecting to SQL databases](#connecting-to-sql-databases)
20
+ * [Reading files (csv, excel, text)](#reading-files-csv-excel-text)
21
+ * [Your own relations](#your-own-relations)
22
+ * [List of supported operators](#supported-operators)
23
+ * [How is this different?](#how-is-this-different)
24
+ * [... from similar libraries](#-from-similar-libraries)
25
+ * [... from Alf](#-from-alf)
26
+ * [Contribute](#contribute)
27
+ * [License](#license)
14
28
 
15
29
  ## Example
16
30
 
@@ -27,7 +41,7 @@ suppliers = Bmg::Relation.new([
27
41
  ])
28
42
 
29
43
  by_city = suppliers
30
- .restrict(Predicate.neq(status: 30))
44
+ .exclude(status: 30)
31
45
  .extend(upname: ->(t){ t[:name].upcase })
32
46
  .group([:sid, :name, :status], :suppliers_in)
33
47
 
@@ -35,76 +49,158 @@ puts JSON.pretty_generate(by_city)
35
49
  # [{...},...]
36
50
  ```
37
51
 
38
- ## Connecting to a SQL database
52
+ ## Where are base relations coming from?
53
+
54
+ Bmg sees relations as sets/enumerable of symbolized Ruby hashes. The following
55
+ sections show you how to get them in the first place, to enter Relationland.
56
+
57
+ ### Memory relations
58
+
59
+ If you have an Array of Hashes -- in fact any Enumerable -- you can easily get
60
+ a Relation using either `Bmg::Relation.new` or `Bmg.in_memory`.
61
+
62
+ ```ruby
63
+ # this...
64
+ r = Bmg::Relation.new [{id: 1}, {id: 2}]
65
+
66
+ # is the same as this...
67
+ r = Bmg.in_memory [{id: 1}, {id: 2}]
68
+
69
+ # entire algebra is available on `r`
70
+ ```
71
+
72
+ ### Connecting to SQL databases
39
73
 
40
- Bmg requires `sequel >= 3.0` to connect to SQL databases.
74
+ Bmg currently requires `sequel >= 3.0` to connect to SQL databases. You also
75
+ need to require `bmg/sequel`.
41
76
 
42
77
  ```ruby
43
78
  require 'sqlite3'
44
79
  require 'bmg'
45
80
  require 'bmg/sequel'
81
+ ```
46
82
 
47
- DB = Sequel.connect("sqlite://suppliers-and-parts.db")
83
+ Then `Bmg.sequel` serves relations for tables of your SQL database:
48
84
 
85
+ ```ruby
86
+ DB = Sequel.connect("sqlite://suppliers-and-parts.db")
49
87
  suppliers = Bmg.sequel(:suppliers, DB)
88
+ ```
89
+
90
+ The entire algebra is available on those relations. As long as you keep using
91
+ operators that can be translated to SQL, results remain SQL-able:
50
92
 
93
+ ```ruby
51
94
  big_suppliers = suppliers
52
- .restrict(Predicate.neq(status: 30))
95
+ .exclude(status: 30)
96
+ .project([:sid, :name])
53
97
 
54
98
  puts big_suppliers.to_sql
55
- # SELECT `t1`.`sid`, `t1`.`name`, `t1`.`status`, `t1`.`city` FROM `suppliers` AS 't1' WHERE (`t1`.`status` != 30)
99
+ # SELECT `t1`.`sid`, `t1`.`name` FROM `suppliers` AS 't1' WHERE (`t1`.`status` != 30)
100
+ ```
56
101
 
57
- puts JSON.pretty_generate(big_suppliers)
58
- # [{...},...]
102
+ Operators not translatable to SQL are available too (such as `group` below).
103
+ Bmg fallbacks to memory operators for them, but remains capable of pushing some
104
+ operators down the tree as illustrated below (the restriction on `:city` is
105
+ pushed to the SQL server):
106
+
107
+ ```ruby
108
+ Bmg.sequel(:suppliers, sequel_db)
109
+ .project([:sid, :name, :city])
110
+ .group([:sid, :name], :suppliers_in)
111
+ .restrict(city: ["Paris", "London"])
112
+ .debug
113
+
114
+ # (group
115
+ # (sequel SELECT `t1`.`sid`, `t1`.`name`, `t1`.`city` FROM `suppliers` AS 't1' WHERE (`t1`.`city` IN ('Paris', 'London')))
116
+ # [:sid, :name, :status]
117
+ # :suppliers_in
118
+ # {:array=>false})
59
119
  ```
60
120
 
61
- ## How is this different from similar libraries?
121
+ ### Reading files (csv, excel, text)
62
122
 
63
- 1. The libraries you probably know (Sequel, Arel, SQLAlchemy, Korma, jOOQ,
64
- etc.) do not implement a genuine relational algebra: their support for
65
- chaining relational operators is limited (yielding errors or wrong SQL
66
- queries). Bmg **always** allows chaining operators. If it does not, it's
67
- a bug. In other words, the following query is 100% valid:
123
+ Bmg provides simple adapters to read files and reach Relationland as soon as
124
+ possible.
68
125
 
69
- relation
70
- .restrict(...) # aka where
71
- .union(...)
72
- .summarize(...) # aka group by
73
- .restrict(...)
126
+ #### CSV files
74
127
 
75
- 2. Bmg supports in memory relations, json relations, csv relations, SQL
76
- relations and so on. It's not tight to SQL generation, and supports
77
- queries accross multiple data sources.
128
+ ```ruby
129
+ csv_options = { col_sep: ",", quote_char: '"' }
130
+ r = Bmg.csv("path/to/a/file.csv", csv_options)
131
+ ```
78
132
 
79
- 3. Bmg makes a best effort to optimize queries, simplifying both generated
80
- SQL code (low-level accesses to datasources) and in-memory operations.
133
+ Options are directly transmitted to `::CSV.new`, check ruby's standard
134
+ library.
81
135
 
82
- 4. Bmg supports various *structuring* operators (group, image, autowrap,
83
- autosummarize, etc.) and allows building 'non flat' relations.
136
+ #### Excel files
84
137
 
85
- ## How is this different from Alf?
138
+ You will need to add [`roo`](https://github.com/roo-rb/roo) to your Gemfile to
139
+ read `.xls` and `.xlsx` files with Bmg.
86
140
 
87
- 1. Bmg's implementation is much simpler than Alf, and uses no ruby core
88
- extention.
141
+ ```ruby
142
+ roo_options = { skip: 1 }
143
+ r = Bmg.excel("path/to/a/file.xls", roo_options)
144
+ ```
89
145
 
90
- 2. We are confident using Bmg in production. Systematic inspection of query
91
- plans is suggested though. Alf was a bit too experimental to be used on
92
- (critical) production systems.
146
+ Options are directly transmitted to `Roo::Spreadsheet.open`, check roo's
147
+ documentation.
93
148
 
94
- 2. Alf exposes a functional syntax, command line tool, restful tools and
95
- many more. Bmg is limited to the core algebra, main Relation abstraction
96
- and SQL generation.
149
+ #### Text files
97
150
 
98
- 3. Bmg is less strict regarding conformance to relational theory, and
99
- may actually expose non relational features (such as support for null,
100
- left_join operator, etc.). Sharp tools hurt, use them with great care.
151
+ There is also a straightforward way to read text files and convert lines to
152
+ tuples.
101
153
 
102
- 4. Bmg does not yet implement all operators documented on try-alf.org, even
103
- if we plan to eventually support them all.
154
+ ```ruby
155
+ r = Bmg.text_file("path/to/a/file.txt")
156
+ r.type.attrlist
157
+ # => [:line, :text]
158
+ ```
104
159
 
105
- 5. Bmg has a few additional operators that prove very useful on real
106
- production use cases: prefix, suffix, autowrap, autosummarize, left_join,
107
- rxmatch, etc.
160
+ Without options tuples will have `:line` and `:text` attributes, the former
161
+ being the line number (starting at 1) and the latter being the line itself
162
+ (stripped).
163
+
164
+ The are a couple of options (see `Bmg::Reader::Textfile`). The most useful one
165
+ is the use a of a Regexp with named captures to automatically extract
166
+ attributes:
167
+
168
+ ```ruby
169
+ r = Bmg.text_file("path/to/a/file.txt", parse: /GET (?<url>([^\s]+))/)
170
+ r.type.attrlist
171
+ # => [:line, :url]
172
+ ```
173
+
174
+ In this scenario, non matching lines are skipped. The `:line` attribute keeps
175
+ being used to have at least one candidate key (so to speak).
176
+
177
+ ### Your own relations
178
+
179
+ As noted earlier, Bmg has a simple relation interface where you only have to
180
+ provide an iteration of symbolized tuples.
181
+
182
+ ```ruby
183
+ class MyRelation
184
+ include Bmg::Relation
185
+
186
+ def each
187
+ yield(id: 1, name: "Alf", year: 2014)
188
+ yield(id: 2, name: "Bmg", year: 2018)
189
+ end
190
+ end
191
+
192
+ MyRelation.new
193
+ .restrict(Predicate.gt(:year, 2015))
194
+ .allbut([:year])
195
+ ```
196
+
197
+ As shown, creating adapters on top of various data source is straighforward.
198
+ Adapters can also participate to query optimization (such as pushing
199
+ restrictions down the tree) by overriding the underscored version of operators
200
+ (e.g. `_restrict`).
201
+
202
+ Have a look at `Bmg::Algebra` for the protocol and `Bmg::Sql::Relation` for an
203
+ example. Keep in touch with the team if you need some help.
108
204
 
109
205
  ## Supported operators
110
206
 
@@ -114,8 +210,10 @@ r.autowrap(split: '_') # structure a flat relation, split:
114
210
  r.autosummarize([:a, :b, ...], x: :sum) # (experimental) usual summarizers supported
115
211
  r.constants(x: 12, ...) # add constant attributes (sometimes useful in unions)
116
212
  r.extend(x: ->(t){ ... }, ...) # add computed attributes
213
+ r.exclude(predicate) # shortcut for restrict(!predicate)
117
214
  r.group([:a, :b, ...], :x) # relation-valued attribute from attributes
118
215
  r.image(right, :x, [:a, :b, ...]) # relation-valued attribute from another relation
216
+ r.images({:x => r1, :y => r2}, [:a, ...]) # shortcut over image(r1, :x, ...).image(r2, :y, ...)
119
217
  r.join(right, [:a, :b, ...]) # natural join on a join key
120
218
  r.join(right, :a => :x, :b => :y, ...) # natural join after right reversed renaming
121
219
  r.left_join(right, [:a, :b, ...], {...}) # left join with optional default right tuple
@@ -132,15 +230,100 @@ r.restrict(a: "foo", b: "bar", ...) # relational restriction, aka where
132
230
  r.rxmatch([:a, :b, ...], /xxx/) # regex match kind of restriction
133
231
  r.summarize([:a, :b, ...], x: :sum) # relational summarization
134
232
  r.suffix(:_foo, but: [:a, ...]) # suffix kind of renaming
233
+ t.transform(:to_s) # all-attrs transformation
234
+ t.transform(&:to_s) # similar, but Proc-driven
235
+ t.transform(:foo => :upcase, ...) # specific-attrs tranformation
236
+ t.transform([:to_s, :upcase]) # chain-transformation
135
237
  r.union(right) # relational union
238
+ r.where(predicate) # alias for restrict(predicate)
136
239
  ```
137
240
 
138
- ## Who is behind Bmg?
241
+ ## How is this different?
242
+
243
+ ### ... from similar libraries?
244
+
245
+ 1. The libraries you probably know (Sequel, Arel, SQLAlchemy, Korma, jOOQ,
246
+ etc.) do not implement a genuine relational algebra. Their support for
247
+ chaining relational operators is thus limited (restricting your expression
248
+ power and/or raising errors and/or outputting wrong or counterintuitive
249
+ SQL code). Bmg **always** allows chaining operators. If it does not, it's
250
+ a bug.
251
+
252
+ For instance the expression below is 100% valid in Bmg. The last where
253
+ clause applies to the result of the summarize (while SQL requires a `HAVING`
254
+ clause, or a `SELECT ... FROM (SELECT ...) r`).
255
+
256
+ ```ruby
257
+ relation
258
+ .where(...)
259
+ .union(...)
260
+ .summarize(...) # aka group by
261
+ .where(...)
262
+ ```
263
+
264
+ 2. Bmg supports in memory relations, json relations, csv relations, SQL
265
+ relations and so on. It's not tight to SQL generation, and supports
266
+ queries accross multiple data sources.
267
+
268
+ 3. Bmg makes a best effort to optimize queries, simplifying both generated
269
+ SQL code (low-level accesses to datasources) and in-memory operations.
270
+
271
+ 4. Bmg supports various *structuring* operators (group, image, autowrap,
272
+ autosummarize, etc.) and allows building 'non flat' relations.
273
+
274
+ 5. Bmg can use full ruby power when that helps (e.g. regular expressions in
275
+ WHERE clauses or ruby code in EXTEND clauses). This may prevent Bmg from
276
+ delegating work to underlying data sources (e.g. SQL server) and should
277
+ therefore be used with care though.
278
+
279
+ ### ... from Alf?
280
+
281
+ If you use Alf (or used it in the past), below are the main differences between
282
+ Bmg and Alf. Bmg has NOT been written to be API-compatible with Alf and will
283
+ probably never be.
284
+
285
+ 1. Bmg's implementation is much simpler than Alf and uses no ruby core
286
+ extention.
287
+
288
+ 2. We are confident using Bmg in production. Systematic inspection of query
289
+ plans is advised though. Alf was a bit too experimental to be used on
290
+ (critical) production systems.
291
+
292
+ 3. Alf exposes a functional syntax, command line tool, restful tools and
293
+ many more. Bmg is limited to the core algebra, main Relation abstraction
294
+ and SQL generation.
139
295
 
140
- Bernard Lambeau (bernard@klaro.cards) is Alf & Bmg main engineer & maintainer.
296
+ 4. Bmg is less strict regarding conformance to relational theory, and
297
+ may actually expose non relational features (such as support for null,
298
+ left_join operator, etc.). Sharp tools hurt, use them with care.
299
+
300
+ 5. Unlike Alf::Relation instances of Bmg::Relation capture query-trees, not
301
+ values. Currently two instances `r1` and `r2` are not equal even if they
302
+ define the same mathematical relation. As a consequence joining on
303
+ relation-valued attributes does not work as expected in Bmg until further
304
+ notice.
305
+
306
+ 6. Bmg does not implement all operators documented on try-alf.org, even if
307
+ we plan to eventually support most of them.
308
+
309
+ 7. Bmg has a few additional operators that prove very useful on real
310
+ production use cases: prefix, suffix, autowrap, autosummarize, left_join,
311
+ rxmatch, etc.
312
+
313
+ 8. Bmg optimizes queries and compiles them to SQL on the fly, while Alf was
314
+ building an AST internally first. Strictly speaking this makes Bmg less
315
+ powerful than Alf since optimizations cannot be turned off for now.
316
+
317
+ ## Contribute
318
+
319
+ Please use github issues and pull requests for all questions, bug reports,
320
+ and contributions. Don't hesitate to get in touch with us with an early code
321
+ spike if you plan to add non trivial features.
322
+
323
+ ## Licence
324
+
325
+ This software is distributed by Enspirit SRL under a MIT Licence. Please
326
+ contact Bernard Lambeau (blambeau@gmail.com) with any question.
141
327
 
142
328
  Enspirit (https://enspirit.be) and Klaro App (https://klaro.cards) are both
143
329
  actively using and contributing to the library.
144
-
145
- Feel free to contact us for help, ideas and/or contributions. Please use github
146
- issues and pull requests if possible if code is involved.
data/lib/bmg.rb CHANGED
@@ -1,6 +1,7 @@
1
1
  require 'path'
2
2
  require 'predicate'
3
3
  require 'forwardable'
4
+ require 'set'
4
5
  module Bmg
5
6
 
6
7
  def in_memory(enumerable, type = Type::ANY)
@@ -8,6 +9,11 @@ module Bmg
8
9
  end
9
10
  module_function :in_memory
10
11
 
12
+ def text_file(path, options = {}, type = Type::ANY)
13
+ Reader::TextFile.new(type, path, options).spied(main_spy)
14
+ end
15
+ module_function :text_file
16
+
11
17
  def csv(path, options = {}, type = Type::ANY)
12
18
  Reader::Csv.new(type, path, options).spied(main_spy)
13
19
  end
@@ -44,6 +50,7 @@ module Bmg
44
50
  require_relative 'bmg/relation/in_memory'
45
51
  require_relative 'bmg/relation/spied'
46
52
  require_relative 'bmg/relation/materialized'
53
+ require_relative 'bmg/relation/proxy'
47
54
 
48
55
  # Deprecated
49
56
  Leaf = Relation::InMemory
data/lib/bmg/algebra.rb CHANGED
@@ -172,6 +172,17 @@ module Bmg
172
172
  end
173
173
  protected :_summarize
174
174
 
175
+ def transform(transformation = nil, options = {}, &proc)
176
+ transformation, options = proc, (transformation || {}) unless proc.nil?
177
+ return self if transformation.is_a?(Hash) && transformation.empty?
178
+ _transform(self.type.transform(transformation, options), transformation, options)
179
+ end
180
+
181
+ def _transform(type, transformation, options)
182
+ Operator::Transform.new(type, self, transformation, options)
183
+ end
184
+ protected :_transform
185
+
175
186
  def union(other, options = {})
176
187
  return self if other.is_a?(Relation::Empty)
177
188
  _union self.type.union(other.type), other, options
@@ -2,6 +2,14 @@ module Bmg
2
2
  module Algebra
3
3
  module Shortcuts
4
4
 
5
+ def where(predicate)
6
+ restrict(predicate)
7
+ end
8
+
9
+ def exclude(predicate)
10
+ restrict(!Predicate.coerce(predicate))
11
+ end
12
+
5
13
  def rxmatch(attrs, matcher, options = {})
6
14
  predicate = attrs.inject(Predicate.contradiction){|p,a|
7
15
  p | Predicate.match(a, matcher, options)
@@ -31,6 +39,12 @@ module Bmg
31
39
  self.image(right.rename(renaming), as, on.keys, options)
32
40
  end
33
41
 
42
+ def images(rights, on = [], options = {})
43
+ rights.each_pair.inject(self){|memo,(as,right)|
44
+ memo.image(right, as, on, options)
45
+ }
46
+ end
47
+
34
48
  def join(right, on = [])
35
49
  return super unless on.is_a?(Hash)
36
50
  renaming = Hash[on.map{|k,v| [v,k] }]