bmg 0.17.6 → 0.18.2

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: e67d11656619195bfaa20a9d55fff49d593ccaa3ad558d0876c82534977d4e04
4
- data.tar.gz: 7ecd9b926d4289feb0fec3efcafcf7b14c483b70f8c4e47b87e3ba038db64774
3
+ metadata.gz: 2c9b490328160c9816102f7dd34b3ee46207ab4362cd2af6d8623c9fc5d4b87a
4
+ data.tar.gz: d113dc14ae1811f2b5565af9e84a0ffbc82b7842c46284af7fcad3bc9f3b7470
5
5
  SHA512:
6
- metadata.gz: 4d99c96391c73c096e3e981155cbe53e637c909854a0e3fe92bae991eb7da1be7cf78188c87f9fb53cdbb90e5dd0d92b4fec0501f8d1fcf7f1cdb1900828d575
7
- data.tar.gz: f8ff33a830852b1e0c5648596ef5f11cae7fc5f7bd817c1ed30a73ce3da81a92bfb42871eecd928a3ce7b717499fb9a975ee8f174a1b013054af506ed056b825
6
+ metadata.gz: fd42b8787cc093c6452386760744d7480c64c8be186ab73b87a548a1b853643b46a6ad48cf019da9d77fddbb38c824eb0e0bef3ef5fa2f333f1d1240f5a0bd84
7
+ data.tar.gz: 7bd0cd32a28963123b03b59152fa03d6470bbdf1a3c6cfe0b17a6667a80d97e65d5309def4e31ec9cbb47353d409ad9ad2a6d1fbc9c2ec10797460852836ce35
data/Gemfile CHANGED
@@ -1,5 +1,2 @@
1
1
  source "https://rubygems.org"
2
2
  gemspec
3
-
4
- # gem "predicate", github: "enspirit/predicate", branch: "placeholders"
5
- # gem "predicate", path: "../predicate"
data/README.md CHANGED
@@ -1,16 +1,30 @@
1
1
  # Bmg, a relational algebra (Alf's successor)!
2
2
 
3
+ [![Build Status](https://travis-ci.com/enspirit/bmg.svg?branch=master)](https://travis-ci.com/enspirit/bmg)
4
+
3
5
  Bmg is a relational algebra implemented as a ruby library. It implements the
4
6
  [Relation as First-Class Citizen](http://www.try-alf.org/blog/2013-10-21-relations-as-first-class-citizen)
5
- paradigm contributed with Alf a few years ago.
6
-
7
- Like Alf, Bmg can be used to query relations in memory, from various files,
8
- SQL databases, and any data sources that can be seen as serving relations.
9
- Cross data-sources joins are supported, as with Alf.
10
-
11
- Unlike Alf, Bmg does not make any core ruby extension and exposes the
12
- object-oriented syntax only (not Alf's functional one). Bmg implementation is
13
- also much simpler, and make its easier to implement user-defined relations.
7
+ paradigm contributed with [Alf](http://www.try-alf.org/) a few years ago.
8
+
9
+ Bmg can be used to query relations in memory, from various files, SQL databases,
10
+ and any data source that can be seen as serving relations. Cross data-sources
11
+ joins are supported, as with Alf. For differences with Alf, see a section
12
+ further down this README.
13
+
14
+ ## Outline
15
+
16
+ * [Example](#example)
17
+ * [Where are base relations coming from?](#where-are-base-relations-coming-from)
18
+ * [Memory relations](#memory-relations)
19
+ * [Connecting to SQL databases](#connecting-to-sql-databases)
20
+ * [Reading files (csv, excel, text)](#reading-files-csv-excel-text)
21
+ * [Your own relations](#your-own-relations)
22
+ * [List of supported operators](#supported-operators)
23
+ * [How is this different?](#how-is-this-different)
24
+ * [... from similar libraries](#-from-similar-libraries)
25
+ * [... from Alf](#-from-alf)
26
+ * [Contribute](#contribute)
27
+ * [License](#license)
14
28
 
15
29
  ## Example
16
30
 
@@ -27,7 +41,7 @@ suppliers = Bmg::Relation.new([
27
41
  ])
28
42
 
29
43
  by_city = suppliers
30
- .restrict(Predicate.neq(status: 30))
44
+ .exclude(status: 30)
31
45
  .extend(upname: ->(t){ t[:name].upcase })
32
46
  .group([:sid, :name, :status], :suppliers_in)
33
47
 
@@ -35,76 +49,158 @@ puts JSON.pretty_generate(by_city)
35
49
  # [{...},...]
36
50
  ```
37
51
 
38
- ## Connecting to a SQL database
52
+ ## Where are base relations coming from?
53
+
54
+ Bmg sees relations as sets/enumerable of symbolized Ruby hashes. The following
55
+ sections show you how to get them in the first place, to enter Relationland.
56
+
57
+ ### Memory relations
58
+
59
+ If you have an Array of Hashes -- in fact any Enumerable -- you can easily get
60
+ a Relation using either `Bmg::Relation.new` or `Bmg.in_memory`.
61
+
62
+ ```ruby
63
+ # this...
64
+ r = Bmg::Relation.new [{id: 1}, {id: 2}]
65
+
66
+ # is the same as this...
67
+ r = Bmg.in_memory [{id: 1}, {id: 2}]
68
+
69
+ # entire algebra is available on `r`
70
+ ```
71
+
72
+ ### Connecting to SQL databases
39
73
 
40
- Bmg requires `sequel >= 3.0` to connect to SQL databases.
74
+ Bmg currently requires `sequel >= 3.0` to connect to SQL databases. You also
75
+ need to require `bmg/sequel`.
41
76
 
42
77
  ```ruby
43
78
  require 'sqlite3'
44
79
  require 'bmg'
45
80
  require 'bmg/sequel'
81
+ ```
46
82
 
47
- DB = Sequel.connect("sqlite://suppliers-and-parts.db")
83
+ Then `Bmg.sequel` serves relations for tables of your SQL database:
48
84
 
85
+ ```ruby
86
+ DB = Sequel.connect("sqlite://suppliers-and-parts.db")
49
87
  suppliers = Bmg.sequel(:suppliers, DB)
88
+ ```
89
+
90
+ The entire algebra is available on those relations. As long as you keep using
91
+ operators that can be translated to SQL, results remain SQL-able:
50
92
 
93
+ ```ruby
51
94
  big_suppliers = suppliers
52
- .restrict(Predicate.neq(status: 30))
95
+ .exclude(status: 30)
96
+ .project([:sid, :name])
53
97
 
54
98
  puts big_suppliers.to_sql
55
- # SELECT `t1`.`sid`, `t1`.`name`, `t1`.`status`, `t1`.`city` FROM `suppliers` AS 't1' WHERE (`t1`.`status` != 30)
99
+ # SELECT `t1`.`sid`, `t1`.`name` FROM `suppliers` AS 't1' WHERE (`t1`.`status` != 30)
100
+ ```
56
101
 
57
- puts JSON.pretty_generate(big_suppliers)
58
- # [{...},...]
102
+ Operators not translatable to SQL are available too (such as `group` below).
103
+ Bmg fallbacks to memory operators for them, but remains capable of pushing some
104
+ operators down the tree as illustrated below (the restriction on `:city` is
105
+ pushed to the SQL server):
106
+
107
+ ```ruby
108
+ Bmg.sequel(:suppliers, sequel_db)
109
+ .project([:sid, :name, :city])
110
+ .group([:sid, :name], :suppliers_in)
111
+ .restrict(city: ["Paris", "London"])
112
+ .debug
113
+
114
+ # (group
115
+ # (sequel SELECT `t1`.`sid`, `t1`.`name`, `t1`.`city` FROM `suppliers` AS 't1' WHERE (`t1`.`city` IN ('Paris', 'London')))
116
+ # [:sid, :name, :status]
117
+ # :suppliers_in
118
+ # {:array=>false})
59
119
  ```
60
120
 
61
- ## How is this different from similar libraries?
121
+ ### Reading files (csv, excel, text)
62
122
 
63
- 1. The libraries you probably know (Sequel, Arel, SQLAlchemy, Korma, jOOQ,
64
- etc.) do not implement a genuine relational algebra: their support for
65
- chaining relational operators is limited (yielding errors or wrong SQL
66
- queries). Bmg **always** allows chaining operators. If it does not, it's
67
- a bug. In other words, the following query is 100% valid:
123
+ Bmg provides simple adapters to read files and reach Relationland as soon as
124
+ possible.
68
125
 
69
- relation
70
- .restrict(...) # aka where
71
- .union(...)
72
- .summarize(...) # aka group by
73
- .restrict(...)
126
+ #### CSV files
74
127
 
75
- 2. Bmg supports in memory relations, json relations, csv relations, SQL
76
- relations and so on. It's not tight to SQL generation, and supports
77
- queries accross multiple data sources.
128
+ ```ruby
129
+ csv_options = { col_sep: ",", quote_char: '"' }
130
+ r = Bmg.csv("path/to/a/file.csv", csv_options)
131
+ ```
78
132
 
79
- 3. Bmg makes a best effort to optimize queries, simplifying both generated
80
- SQL code (low-level accesses to datasources) and in-memory operations.
133
+ Options are directly transmitted to `::CSV.new`, check ruby's standard
134
+ library.
81
135
 
82
- 4. Bmg supports various *structuring* operators (group, image, autowrap,
83
- autosummarize, etc.) and allows building 'non flat' relations.
136
+ #### Excel files
84
137
 
85
- ## How is this different from Alf?
138
+ You will need to add [`roo`](https://github.com/roo-rb/roo) to your Gemfile to
139
+ read `.xls` and `.xlsx` files with Bmg.
86
140
 
87
- 1. Bmg's implementation is much simpler than Alf, and uses no ruby core
88
- extention.
141
+ ```ruby
142
+ roo_options = { skip: 1 }
143
+ r = Bmg.excel("path/to/a/file.xls", roo_options)
144
+ ```
89
145
 
90
- 2. We are confident using Bmg in production. Systematic inspection of query
91
- plans is suggested though. Alf was a bit too experimental to be used on
92
- (critical) production systems.
146
+ Options are directly transmitted to `Roo::Spreadsheet.open`, check roo's
147
+ documentation.
93
148
 
94
- 2. Alf exposes a functional syntax, command line tool, restful tools and
95
- many more. Bmg is limited to the core algebra, main Relation abstraction
96
- and SQL generation.
149
+ #### Text files
97
150
 
98
- 3. Bmg is less strict regarding conformance to relational theory, and
99
- may actually expose non relational features (such as support for null,
100
- left_join operator, etc.). Sharp tools hurt, use them with great care.
151
+ There is also a straightforward way to read text files and convert lines to
152
+ tuples.
101
153
 
102
- 4. Bmg does not yet implement all operators documented on try-alf.org, even
103
- if we plan to eventually support them all.
154
+ ```ruby
155
+ r = Bmg.text_file("path/to/a/file.txt")
156
+ r.type.attrlist
157
+ # => [:line, :text]
158
+ ```
104
159
 
105
- 5. Bmg has a few additional operators that prove very useful on real
106
- production use cases: prefix, suffix, autowrap, autosummarize, left_join,
107
- rxmatch, etc.
160
+ Without options tuples will have `:line` and `:text` attributes, the former
161
+ being the line number (starting at 1) and the latter being the line itself
162
+ (stripped).
163
+
164
+ The are a couple of options (see `Bmg::Reader::Textfile`). The most useful one
165
+ is the use a of a Regexp with named captures to automatically extract
166
+ attributes:
167
+
168
+ ```ruby
169
+ r = Bmg.text_file("path/to/a/file.txt", parse: /GET (?<url>([^\s]+))/)
170
+ r.type.attrlist
171
+ # => [:line, :url]
172
+ ```
173
+
174
+ In this scenario, non matching lines are skipped. The `:line` attribute keeps
175
+ being used to have at least one candidate key (so to speak).
176
+
177
+ ### Your own relations
178
+
179
+ As noted earlier, Bmg has a simple relation interface where you only have to
180
+ provide an iteration of symbolized tuples.
181
+
182
+ ```ruby
183
+ class MyRelation
184
+ include Bmg::Relation
185
+
186
+ def each
187
+ yield(id: 1, name: "Alf", year: 2014)
188
+ yield(id: 2, name: "Bmg", year: 2018)
189
+ end
190
+ end
191
+
192
+ MyRelation.new
193
+ .restrict(Predicate.gt(:year, 2015))
194
+ .allbut([:year])
195
+ ```
196
+
197
+ As shown, creating adapters on top of various data source is straighforward.
198
+ Adapters can also participate to query optimization (such as pushing
199
+ restrictions down the tree) by overriding the underscored version of operators
200
+ (e.g. `_restrict`).
201
+
202
+ Have a look at `Bmg::Algebra` for the protocol and `Bmg::Sql::Relation` for an
203
+ example. Keep in touch with the team if you need some help.
108
204
 
109
205
  ## Supported operators
110
206
 
@@ -114,8 +210,10 @@ r.autowrap(split: '_') # structure a flat relation, split:
114
210
  r.autosummarize([:a, :b, ...], x: :sum) # (experimental) usual summarizers supported
115
211
  r.constants(x: 12, ...) # add constant attributes (sometimes useful in unions)
116
212
  r.extend(x: ->(t){ ... }, ...) # add computed attributes
213
+ r.exclude(predicate) # shortcut for restrict(!predicate)
117
214
  r.group([:a, :b, ...], :x) # relation-valued attribute from attributes
118
215
  r.image(right, :x, [:a, :b, ...]) # relation-valued attribute from another relation
216
+ r.images({:x => r1, :y => r2}, [:a, ...]) # shortcut over image(r1, :x, ...).image(r2, :y, ...)
119
217
  r.join(right, [:a, :b, ...]) # natural join on a join key
120
218
  r.join(right, :a => :x, :b => :y, ...) # natural join after right reversed renaming
121
219
  r.left_join(right, [:a, :b, ...], {...}) # left join with optional default right tuple
@@ -137,14 +235,95 @@ t.transform(&:to_s) # similar, but Proc-driven
137
235
  t.transform(:foo => :upcase, ...) # specific-attrs tranformation
138
236
  t.transform([:to_s, :upcase]) # chain-transformation
139
237
  r.union(right) # relational union
238
+ r.where(predicate) # alias for restrict(predicate)
140
239
  ```
141
240
 
142
- ## Who is behind Bmg?
241
+ ## How is this different?
242
+
243
+ ### ... from similar libraries?
244
+
245
+ 1. The libraries you probably know (Sequel, Arel, SQLAlchemy, Korma, jOOQ,
246
+ etc.) do not implement a genuine relational algebra. Their support for
247
+ chaining relational operators is thus limited (restricting your expression
248
+ power and/or raising errors and/or outputting wrong or counterintuitive
249
+ SQL code). Bmg **always** allows chaining operators. If it does not, it's
250
+ a bug.
251
+
252
+ For instance the expression below is 100% valid in Bmg. The last where
253
+ clause applies to the result of the summarize (while SQL requires a `HAVING`
254
+ clause, or a `SELECT ... FROM (SELECT ...) r`).
255
+
256
+ ```ruby
257
+ relation
258
+ .where(...)
259
+ .union(...)
260
+ .summarize(...) # aka group by
261
+ .where(...)
262
+ ```
263
+
264
+ 2. Bmg supports in memory relations, json relations, csv relations, SQL
265
+ relations and so on. It's not tight to SQL generation, and supports
266
+ queries accross multiple data sources.
267
+
268
+ 3. Bmg makes a best effort to optimize queries, simplifying both generated
269
+ SQL code (low-level accesses to datasources) and in-memory operations.
270
+
271
+ 4. Bmg supports various *structuring* operators (group, image, autowrap,
272
+ autosummarize, etc.) and allows building 'non flat' relations.
273
+
274
+ 5. Bmg can use full ruby power when that helps (e.g. regular expressions in
275
+ WHERE clauses or ruby code in EXTEND clauses). This may prevent Bmg from
276
+ delegating work to underlying data sources (e.g. SQL server) and should
277
+ therefore be used with care though.
278
+
279
+ ### ... from Alf?
280
+
281
+ If you use Alf (or used it in the past), below are the main differences between
282
+ Bmg and Alf. Bmg has NOT been written to be API-compatible with Alf and will
283
+ probably never be.
284
+
285
+ 1. Bmg's implementation is much simpler than Alf and uses no ruby core
286
+ extention.
287
+
288
+ 2. We are confident using Bmg in production. Systematic inspection of query
289
+ plans is advised though. Alf was a bit too experimental to be used on
290
+ (critical) production systems.
291
+
292
+ 3. Alf exposes a functional syntax, command line tool, restful tools and
293
+ many more. Bmg is limited to the core algebra, main Relation abstraction
294
+ and SQL generation.
143
295
 
144
- Bernard Lambeau (bernard@klaro.cards) is Alf & Bmg main engineer & maintainer.
296
+ 4. Bmg is less strict regarding conformance to relational theory, and
297
+ may actually expose non relational features (such as support for null,
298
+ left_join operator, etc.). Sharp tools hurt, use them with care.
299
+
300
+ 5. Unlike Alf::Relation instances of Bmg::Relation capture query-trees, not
301
+ values. Currently two instances `r1` and `r2` are not equal even if they
302
+ define the same mathematical relation. As a consequence joining on
303
+ relation-valued attributes does not work as expected in Bmg until further
304
+ notice.
305
+
306
+ 6. Bmg does not implement all operators documented on try-alf.org, even if
307
+ we plan to eventually support most of them.
308
+
309
+ 7. Bmg has a few additional operators that prove very useful on real
310
+ production use cases: prefix, suffix, autowrap, autosummarize, left_join,
311
+ rxmatch, etc.
312
+
313
+ 8. Bmg optimizes queries and compiles them to SQL on the fly, while Alf was
314
+ building an AST internally first. Strictly speaking this makes Bmg less
315
+ powerful than Alf since optimizations cannot be turned off for now.
316
+
317
+ ## Contribute
318
+
319
+ Please use github issues and pull requests for all questions, bug reports,
320
+ and contributions. Don't hesitate to get in touch with us with an early code
321
+ spike if you plan to add non trivial features.
322
+
323
+ ## Licence
324
+
325
+ This software is distributed by Enspirit SRL under a MIT Licence. Please
326
+ contact Bernard Lambeau (blambeau@gmail.com) with any question.
145
327
 
146
328
  Enspirit (https://enspirit.be) and Klaro App (https://klaro.cards) are both
147
329
  actively using and contributing to the library.
148
-
149
- Feel free to contact us for help, ideas and/or contributions. Please use github
150
- issues and pull requests if possible if code is involved.
data/lib/bmg.rb CHANGED
@@ -1,6 +1,7 @@
1
1
  require 'path'
2
2
  require 'predicate'
3
3
  require 'forwardable'
4
+ require 'set'
4
5
  module Bmg
5
6
 
6
7
  def in_memory(enumerable, type = Type::ANY)
@@ -8,6 +9,11 @@ module Bmg
8
9
  end
9
10
  module_function :in_memory
10
11
 
12
+ def text_file(path, options = {}, type = Type::ANY)
13
+ Reader::TextFile.new(type, path, options).spied(main_spy)
14
+ end
15
+ module_function :text_file
16
+
11
17
  def csv(path, options = {}, type = Type::ANY)
12
18
  Reader::Csv.new(type, path, options).spied(main_spy)
13
19
  end
data/lib/bmg/algebra.rb CHANGED
@@ -174,6 +174,7 @@ module Bmg
174
174
 
175
175
  def transform(transformation = nil, options = {}, &proc)
176
176
  transformation, options = proc, (transformation || {}) unless proc.nil?
177
+ return self if transformation.is_a?(Hash) && transformation.empty?
177
178
  _transform(self.type.transform(transformation, options), transformation, options)
178
179
  end
179
180
 
@@ -2,6 +2,14 @@ module Bmg
2
2
  module Algebra
3
3
  module Shortcuts
4
4
 
5
+ def where(predicate)
6
+ restrict(predicate)
7
+ end
8
+
9
+ def exclude(predicate)
10
+ restrict(!Predicate.coerce(predicate))
11
+ end
12
+
5
13
  def rxmatch(attrs, matcher, options = {})
6
14
  predicate = attrs.inject(Predicate.contradiction){|p,a|
7
15
  p | Predicate.match(a, matcher, options)
@@ -31,6 +39,12 @@ module Bmg
31
39
  self.image(right.rename(renaming), as, on.keys, options)
32
40
  end
33
41
 
42
+ def images(rights, on = [], options = {})
43
+ rights.each_pair.inject(self){|memo,(as,right)|
44
+ memo.image(right, as, on, options)
45
+ }
46
+ end
47
+
34
48
  def join(right, on = [])
35
49
  return super unless on.is_a?(Hash)
36
50
  renaming = Hash[on.map{|k,v| [v,k] }]
@@ -63,12 +63,38 @@ module Bmg
63
63
 
64
64
  protected ### optimization
65
65
 
66
+ def _allbut(type, butlist)
67
+ operand.allbut(self.butlist|butlist)
68
+ end
69
+
70
+ def _matching(type, right, on)
71
+ # Always possible to push the matching, since by construction
72
+ # `on` can only use attributes that have not been trown away,
73
+ # hence they exist on `operand` too.
74
+ operand.matching(right, on).allbut(butlist)
75
+ end
76
+
77
+ def _page(type, ordering, page_index, options)
78
+ return super unless self.preserving_key?
79
+ operand.page(ordering, page_index, options).allbut(butlist)
80
+ end
81
+
82
+ def _project(type, attrlist)
83
+ operand.project(attrlist)
84
+ end
85
+
66
86
  def _restrict(type, predicate)
67
87
  operand.restrict(predicate).allbut(butlist)
68
88
  end
69
89
 
70
90
  protected ### inspect
71
91
 
92
+ def preserving_key?
93
+ operand.type.knows_keys? && operand.type.keys.find{|k|
94
+ (k & butlist).empty?
95
+ }
96
+ end
97
+
72
98
  def args
73
99
  [ butlist ]
74
100
  end
@@ -24,6 +24,22 @@ module Bmg
24
24
 
25
25
  public
26
26
 
27
+ def self.same(*args)
28
+ Same.new(*args)
29
+ end
30
+
31
+ def self.group(*args)
32
+ Group.new(*args)
33
+ end
34
+
35
+ def self.y_by_x(*args)
36
+ YByX.new(*args)
37
+ end
38
+
39
+ def self.ys_by_x(*args)
40
+ YsByX.new(*args)
41
+ end
42
+
27
43
  def each(&bl)
28
44
  h = {}
29
45
  @operand.each do |tuple|
@@ -41,6 +57,12 @@ module Bmg
41
57
  [:autosummarize, operand.to_ast, by.dup, sums.dup]
42
58
  end
43
59
 
60
+ public ### for internal reasons
61
+
62
+ def _count
63
+ operand._count
64
+ end
65
+
44
66
  protected
45
67
 
46
68
  def _restrict(type, predicate)
@@ -175,11 +197,11 @@ module Bmg
175
197
  end
176
198
 
177
199
  def init(v)
178
- [v]
200
+ v.nil? ? [] : [v]
179
201
  end
180
202
 
181
203
  def sum(v1, v2)
182
- v1 << v2
204
+ v2.nil? ? v1 : (v1 << v2)
183
205
  end
184
206
 
185
207
  def term(v)
@@ -211,11 +233,11 @@ module Bmg
211
233
  end
212
234
 
213
235
  def init(v)
214
- [v]
236
+ v.nil? ? [] : [v]
215
237
  end
216
238
 
217
239
  def sum(v1, v2)
218
- v1 << v2
240
+ v2.nil? ? v1 : (v1 << v2)
219
241
  end
220
242
 
221
243
  def term(v)
@@ -52,6 +52,12 @@ module Bmg
52
52
  [ :autowrap, operand.to_ast, @original_options.dup ]
53
53
  end
54
54
 
55
+ public ### for internal reasons
56
+
57
+ def _count
58
+ operand._count
59
+ end
60
+
55
61
  protected ### optimization
56
62
 
57
63
  def _autowrap(type, opts)
@@ -86,6 +92,16 @@ module Bmg
86
92
  false
87
93
  end
88
94
 
95
+ def _matching(type, right, on)
96
+ if (wrapped_roots! & on).empty?
97
+ operand.matching(right, on).autowrap(options)
98
+ else
99
+ super
100
+ end
101
+ rescue UnknownAttributesError
102
+ super
103
+ end
104
+
89
105
  def _page(type, ordering, page_index, opts)
90
106
  attrs = ordering.map{|(a,d)| a }
91
107
  if (wrapped_roots! & attrs).empty?
@@ -97,6 +113,16 @@ module Bmg
97
113
  super
98
114
  end
99
115
 
116
+ def _project(type, attrlist)
117
+ if (wrapped_roots! & attrlist).empty?
118
+ operand.project(attrlist).autowrap(options)
119
+ else
120
+ super
121
+ end
122
+ rescue UnknownAttributesError
123
+ super
124
+ end
125
+
100
126
  def _rename(type, renaming)
101
127
  # 1. Can't optimize if renaming applies to a wrapped one
102
128
  return super unless (wrapped_roots! & renaming.keys).empty?
@@ -54,6 +54,12 @@ module Bmg
54
54
  [ :constants, operand.to_ast, constants.dup ]
55
55
  end
56
56
 
57
+ public ### for internal reasons
58
+
59
+ def _count
60
+ operand._count
61
+ end
62
+
57
63
  protected ### optimization
58
64
 
59
65
  def _page(type, ordering, page_index, options)
@@ -53,6 +53,12 @@ module Bmg
53
53
  [ :extend, operand.to_ast, extension.dup ]
54
54
  end
55
55
 
56
+ public ### for internal reasons
57
+
58
+ def _count
59
+ operand._count
60
+ end
61
+
56
62
  protected ### optimization
57
63
 
58
64
  def _allbut(type, butlist)
@@ -99,9 +99,10 @@ module Bmg
99
99
  key = tuple_project(t, on)
100
100
  index[key].operand << tuple_image(t, on)
101
101
  end
102
- if options[:array]
102
+ if opt = options[:array]
103
+ sorter = to_sorter(opt)
103
104
  index = index.each_with_object({}) do |(k,v),ix|
104
- ix[k] = v.to_a
105
+ ix[k] = sorter ? v.to_a.sort(&sorter) : v.to_a
105
106
  end
106
107
  end
107
108
  index
@@ -154,8 +155,32 @@ module Bmg
154
155
  end
155
156
  end
156
157
 
158
+ public ### for internal reasons
159
+
160
+ def _count
161
+ left._count
162
+ end
163
+
157
164
  protected ### optimization
158
165
 
166
+ def _allbut(type, butlist)
167
+ if butlist.include?(as)
168
+ left.allbut(butlist - [as])
169
+ elsif (butlist & on).empty?
170
+ left.allbut(butlist).image(right, as, on, options)
171
+ else
172
+ super
173
+ end
174
+ end
175
+
176
+ def _matching(type, m_right, m_on)
177
+ if m_on.include?(as)
178
+ super
179
+ else
180
+ left.matching(m_right, m_on).image(right, as, on, options)
181
+ end
182
+ end
183
+
159
184
  def _page(type, ordering, page_index, opts)
160
185
  if ordering.map{|(k,v)| k}.include?(as)
161
186
  super
@@ -166,6 +191,14 @@ module Bmg
166
191
  end
167
192
  end
168
193
 
194
+ def _project(type, attrlist)
195
+ if attrlist.include?(as)
196
+ super
197
+ else
198
+ left.project(attrlist)
199
+ end
200
+ end
201
+
169
202
  def _restrict(type, predicate)
170
203
  on_as, rest = predicate.and_split([as])
171
204
  if rest.tautology?
@@ -227,6 +260,11 @@ module Bmg
227
260
  Relation::InMemory.new(image_type, Set.new)
228
261
  end
229
262
 
263
+ def to_sorter(opt)
264
+ return nil unless opt.is_a?(Array)
265
+ Ordering.new(opt).comparator
266
+ end
267
+
230
268
  public
231
269
 
232
270
  def to_s
@@ -45,13 +45,7 @@ module Bmg
45
45
  protected ### inspect
46
46
 
47
47
  def comparator
48
- ->(t1, t2) {
49
- ordering.each do |(attr,direction)|
50
- c = t1[attr] <=> t2[attr]
51
- return (direction == :desc ? -c : c) unless c==0
52
- end
53
- 0
54
- }
48
+ Ordering.new(@ordering).comparator
55
49
  end
56
50
 
57
51
  def args
@@ -31,7 +31,7 @@ module Bmg
31
31
  def each
32
32
  seen = {}
33
33
  @operand.each do |tuple|
34
- projected = project(tuple)
34
+ projected = tuple_project(tuple)
35
35
  unless seen.has_key?(projected)
36
36
  yield(projected)
37
37
  seen[projected] = true
@@ -74,7 +74,7 @@ module Bmg
74
74
 
75
75
  private
76
76
 
77
- def project(tuple)
77
+ def tuple_project(tuple)
78
78
  tuple.dup.delete_if{|k,_| !@attrlist.include?(k) }
79
79
  end
80
80
 
@@ -60,6 +60,12 @@ module Bmg
60
60
  [ :rename, operand.to_ast, renaming.dup ]
61
61
  end
62
62
 
63
+ public ### for internal reasons
64
+
65
+ def _count
66
+ operand._count
67
+ end
68
+
63
69
  protected ### optimization
64
70
 
65
71
  def _page(type, ordering, page_index, options)
@@ -23,7 +23,7 @@ module Bmg
23
23
 
24
24
  protected
25
25
 
26
- attr_reader :transformation
26
+ attr_reader :transformation, :options
27
27
 
28
28
  public
29
29
 
@@ -40,6 +40,43 @@ module Bmg
40
40
 
41
41
  protected ### optimization
42
42
 
43
+ def _allbut(type, butlist)
44
+ # `allbut` can always be pushed down the tree. unlike
45
+ # `extend` the Proc that might be used cannot use attributes
46
+ # in butlist, so it's safe to strip them away.
47
+ if transformer.knows_attrlist?
48
+ # We just need to clean the transformation
49
+ attrlist = transformer.to_attrlist
50
+ thrown = attrlist & butlist
51
+ t = transformation.dup.reject{|k,v| thrown.include?(k) }
52
+ operand.allbut(butlist).transform(t, options)
53
+ else
54
+ operand.allbut(butlist).transform(transformation, options)
55
+ end
56
+ end
57
+
58
+ def _project(type, attrlist)
59
+ if transformer.knows_attrlist?
60
+ t = transformation.dup.select{|k,v| attrlist.include?(k) }
61
+ operand.project(attrlist).transform(t, options)
62
+ else
63
+ operand.project(attrlist).transform(transformation, options)
64
+ end
65
+ end
66
+
67
+ def _restrict(type, predicate)
68
+ return super unless transformer.knows_attrlist?
69
+ top, bottom = predicate.and_split(transformer.to_attrlist)
70
+ if top == predicate
71
+ super
72
+ else
73
+ operand
74
+ .restrict(bottom)
75
+ .transform(transformation, options)
76
+ .restrict(top)
77
+ end
78
+ end
79
+
43
80
  protected ### inspect
44
81
 
45
82
  def args
data/lib/bmg/reader.rb CHANGED
@@ -9,3 +9,4 @@ module Bmg
9
9
  end
10
10
  require_relative "reader/csv"
11
11
  require_relative "reader/excel"
12
+ require_relative "reader/text_file"
@@ -0,0 +1,56 @@
1
+ module Bmg
2
+ module Reader
3
+ class TextFile
4
+ include Reader
5
+
6
+ DEFAULT_OPTIONS = {
7
+ strip: true,
8
+ parse: nil
9
+ }
10
+
11
+ def initialize(type, path, options = {})
12
+ options = { parse: options } if options.is_a?(Regexp)
13
+ @path = path
14
+ @options = DEFAULT_OPTIONS.merge(options)
15
+ @type = infer_type(type)
16
+ end
17
+ attr_reader :path, :options
18
+
19
+ public # Relation
20
+
21
+ def each
22
+ path.each_line.each_with_index do |text, line|
23
+ text = text.strip if strip?
24
+ parsed = parse(text)
25
+ yield({line: 1+line}.merge(parsed)) if parsed
26
+ end
27
+ end
28
+
29
+ private
30
+
31
+ def infer_type(base)
32
+ return base unless base == Bmg::Type::ANY
33
+ attr_list = if rx = options[:parse]
34
+ [:line] + rx.names.map(&:to_sym)
35
+ else
36
+ [:line, :text]
37
+ end
38
+ base
39
+ .with_attrlist(attr_list)
40
+ .with_keys([[:line]])
41
+ end
42
+
43
+ def strip?
44
+ options[:strip]
45
+ end
46
+
47
+ def parse(text)
48
+ return { text: text } unless rx = options[:parse]
49
+ if match = rx.match(text)
50
+ TupleAlgebra.symbolize_keys(match.named_captures)
51
+ end
52
+ end
53
+
54
+ end # class TextFile
55
+ end # module Reader
56
+ end # module Bmg
data/lib/bmg/relation.rb CHANGED
@@ -17,6 +17,16 @@ module Bmg
17
17
  self
18
18
  end
19
19
 
20
+ def type
21
+ Bmg::Type::ANY
22
+ end
23
+
24
+ def with_type(type)
25
+ dup.tap{|r|
26
+ r.type = type
27
+ }
28
+ end
29
+
20
30
  def with_typecheck
21
31
  dup.tap{|r|
22
32
  r.type = r.type.with_typecheck
@@ -100,6 +110,18 @@ module Bmg
100
110
  end
101
111
  end
102
112
 
113
+ def count
114
+ if type.knows_keys?
115
+ project(type.keys.first)._count
116
+ else
117
+ self._count
118
+ end
119
+ end
120
+
121
+ def _count
122
+ to_a.size
123
+ end
124
+
103
125
  # Returns a json representation
104
126
  def to_json(*args, &bl)
105
127
  to_a.to_json(*args, &bl)
@@ -113,9 +135,10 @@ module Bmg
113
135
  # When no string_or_io is used, the method uses a string.
114
136
  #
115
137
  # The method always returns the string_or_io.
116
- def to_csv(options = {}, string_or_io = nil)
138
+ def to_csv(options = {}, string_or_io = nil, preferences = nil)
117
139
  options, string_or_io = {}, options unless options.is_a?(Hash)
118
- Writer::Csv.new(options).call(self, string_or_io)
140
+ string_or_io, preferences = nil, string_or_io if string_or_io.is_a?(Hash)
141
+ Writer::Csv.new(options, preferences).call(self, string_or_io)
119
142
  end
120
143
 
121
144
  # Converts to an sexpr expression.
@@ -19,6 +19,10 @@ module Bmg
19
19
  def each(&bl)
20
20
  end
21
21
 
22
+ def _count
23
+ 0
24
+ end
25
+
22
26
  def to_ast
23
27
  [ :empty ]
24
28
  end
@@ -17,6 +17,16 @@ module Bmg
17
17
  @operand.each(&bl)
18
18
  end
19
19
 
20
+ def _count
21
+ if operand.respond_to?(:count)
22
+ operand.count
23
+ elsif operand.respond_to?(:size)
24
+ operand.size
25
+ else
26
+ super
27
+ end
28
+ end
29
+
20
30
  def to_ast
21
31
  [ :in_memory, operand ]
22
32
  end
@@ -16,6 +16,12 @@ module Bmg
16
16
  end
17
17
  protected :type=
18
18
 
19
+ public
20
+
21
+ def _count
22
+ operand._count
23
+ end
24
+
19
25
  public
20
26
 
21
27
  def each(&bl)
@@ -24,10 +24,15 @@ module Bmg
24
24
  protected :type=
25
25
 
26
26
  def each(&bl)
27
- spy.call(self)
27
+ spy.call(self) if bl
28
28
  operand.each(&bl)
29
29
  end
30
30
 
31
+ def count
32
+ spy.call(self) if bl
33
+ operand.count
34
+ end
35
+
31
36
  def to_ast
32
37
  [ :spied, operand.to_ast, spy ]
33
38
  end
@@ -33,6 +33,10 @@ module Bmg
33
33
  base_table.update(arg)
34
34
  end
35
35
 
36
+ def _count
37
+ dataset.count
38
+ end
39
+
36
40
  def to_ast
37
41
  [:sequel, dataset.sql]
38
42
  end
@@ -10,7 +10,6 @@ module Bmg
10
10
  end
11
11
 
12
12
  attr_accessor :type
13
- protected :type=
14
13
 
15
14
  protected
16
15
 
data/lib/bmg/support.rb CHANGED
@@ -1,3 +1,5 @@
1
1
  require_relative 'support/tuple_algebra'
2
2
  require_relative 'support/tuple_transformer'
3
3
  require_relative 'support/keys'
4
+ require_relative 'support/ordering'
5
+ require_relative 'support/output_preferences'
@@ -0,0 +1,20 @@
1
+ module Bmg
2
+ class Ordering
3
+
4
+ def initialize(attrs)
5
+ @attrs = attrs
6
+ end
7
+ attr_reader :attrs
8
+
9
+ def comparator
10
+ ->(t1, t2) {
11
+ attrs.each do |(attr,direction)|
12
+ c = t1[attr] <=> t2[attr]
13
+ return (direction == :desc ? -c : c) unless c==0
14
+ end
15
+ 0
16
+ }
17
+ end
18
+
19
+ end # class Ordering
20
+ end # module Bmg
@@ -0,0 +1,44 @@
1
+ module Bmg
2
+ class OutputPreferences
3
+
4
+ DEFAULT_PREFS = {
5
+ attributes_ordering: nil,
6
+ extra_attributes: :after
7
+ }
8
+
9
+ def initialize(options)
10
+ @options = DEFAULT_PREFS.merge(options)
11
+ end
12
+ attr_reader :options
13
+
14
+ def self.dress(arg)
15
+ return arg if arg.is_a?(OutputPreferences)
16
+ arg = {} if arg.nil?
17
+ new(arg)
18
+ end
19
+
20
+ def attributes_ordering
21
+ options[:attributes_ordering]
22
+ end
23
+
24
+ def extra_attributes
25
+ options[:extra_attributes]
26
+ end
27
+
28
+ def order_attrlist(attrlist)
29
+ return attrlist if attributes_ordering.nil?
30
+ index = Hash[attributes_ordering.each_with_index.to_a]
31
+ attrlist.sort{|a,b|
32
+ ai, bi = index[a], index[b]
33
+ if ai && bi
34
+ ai <=> bi
35
+ elsif ai
36
+ extra_attributes == :after ? -1 : 1
37
+ else
38
+ extra_attributes == :after ? 1 : -1
39
+ end
40
+ }
41
+ end
42
+
43
+ end # class OutputPreferences
44
+ end # module Bmg
@@ -19,5 +19,11 @@ module Bmg
19
19
  end
20
20
  module_function :rename
21
21
 
22
+ def symbolize_keys(h)
23
+ return h if h.empty?
24
+ h.each_with_object({}){|(k,v),h| h[k.to_sym] = v }
25
+ end
26
+ module_function :symbolize_keys
27
+
22
28
  end # module TupleAlgebra
23
29
  end # module Bmg
@@ -26,11 +26,7 @@ module Bmg
26
26
 
27
27
  def transform_tuple(tuple, with)
28
28
  case with
29
- when Symbol
30
- tuple.each_with_object({}){|(k,v),dup|
31
- dup[k] = transform_attr(v, with)
32
- }
33
- when Proc
29
+ when Symbol, Proc, Regexp
34
30
  tuple.each_with_object({}){|(k,v),dup|
35
31
  dup[k] = transform_attr(v, with)
36
32
  }
@@ -51,8 +47,13 @@ module Bmg
51
47
  case with
52
48
  when Symbol
53
49
  value.send(with)
50
+ when Regexp
51
+ m = with.match(value.to_s)
52
+ m.nil? ? m : m.to_s
54
53
  when Proc
55
54
  with.call(value)
55
+ when Hash
56
+ with[value]
56
57
  else
57
58
  raise ArgumentError, "Unexpected transformation `#{with.inspect}`"
58
59
  end
data/lib/bmg/version.rb CHANGED
@@ -1,8 +1,8 @@
1
1
  module Bmg
2
2
  module Version
3
3
  MAJOR = 0
4
- MINOR = 17
5
- TINY = 6
4
+ MINOR = 18
5
+ TINY = 2
6
6
  end
7
7
  VERSION = "#{Version::MAJOR}.#{Version::MINOR}.#{Version::TINY}"
8
8
  end
@@ -6,26 +6,37 @@ module Bmg
6
6
  DEFAULT_OPTIONS = {
7
7
  }
8
8
 
9
- def initialize(options)
10
- @options = DEFAULT_OPTIONS.merge(options)
9
+ def initialize(csv_options, output_preferences = nil)
10
+ @csv_options = DEFAULT_OPTIONS.merge(csv_options)
11
+ @output_preferences = OutputPreferences.dress(output_preferences)
11
12
  end
12
- attr_reader :options
13
+ attr_reader :csv_options, :output_preferences
13
14
 
14
15
  def call(relation, string_or_io = nil)
15
16
  require 'csv'
16
17
  string_or_io, to_s = string_or_io.nil? ? [StringIO.new, true] : [string_or_io, false]
17
- headers = relation.type.to_attrlist if relation.type.knows_attrlist?
18
- csv = nil
18
+ headers, csv = infer_headers(relation.type), nil
19
19
  relation.each do |tuple|
20
20
  if csv.nil?
21
- headers = tuple.keys if headers.nil?
22
- csv = CSV.new(string_or_io, options.merge(headers: headers))
21
+ headers = infer_headers(tuple) if headers.nil?
22
+ csv = CSV.new(string_or_io, csv_options.merge(headers: headers))
23
23
  end
24
24
  csv << headers.map{|h| tuple[h] }
25
25
  end
26
26
  to_s ? string_or_io.string : string_or_io
27
27
  end
28
28
 
29
+ private
30
+
31
+ def infer_headers(from)
32
+ attrlist = if from.is_a?(Type) && from.knows_attrlist?
33
+ from.to_attrlist
34
+ elsif from.is_a?(Hash)
35
+ from.keys
36
+ end
37
+ attrlist ? output_preferences.order_attrlist(attrlist) : nil
38
+ end
39
+
29
40
  end # class Csv
30
41
  end # module Writer
31
42
  end # module Bmg
data/tasks/test.rake CHANGED
@@ -6,17 +6,24 @@ namespace :test do
6
6
  desc "Runs unit tests"
7
7
  RSpec::Core::RakeTask.new(:unit) do |t|
8
8
  t.pattern = "spec/unit/**/test_*.rb"
9
- t.rspec_opts = ["-Ilib", "-Ispec/unit", "--fail-fast", "--color", "--backtrace", "--format=progress"]
9
+ t.rspec_opts = ["-Ilib", "-Ispec/unit", "--color", "--backtrace", "--format=progress"]
10
10
  end
11
11
  tests << :unit
12
12
 
13
13
  desc "Runs integration tests"
14
14
  RSpec::Core::RakeTask.new(:integration) do |t|
15
15
  t.pattern = "spec/integration/**/test_*.rb"
16
- t.rspec_opts = ["-Ilib", "-Ispec/integration", "--fail-fast", "--color", "--backtrace", "--format=progress"]
16
+ t.rspec_opts = ["-Ilib", "-Ispec/integration", "--color", "--backtrace", "--format=progress"]
17
17
  end
18
18
  tests << :integration
19
19
 
20
+ desc "Runs github regression tests"
21
+ RSpec::Core::RakeTask.new(:regression) do |t|
22
+ t.pattern = "spec/regression/**/test_*.rb"
23
+ t.rspec_opts = ["-Ilib", "-Ispec/regression", "--color", "--backtrace", "--format=progress"]
24
+ end
25
+ tests << :regression
26
+
20
27
  task :all => tests
21
28
  end
22
29
 
metadata CHANGED
@@ -1,49 +1,49 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: bmg
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.17.6
4
+ version: 0.18.2
5
5
  platform: ruby
6
6
  authors:
7
7
  - Bernard Lambeau
8
8
  autorequire:
9
9
  bindir: bin
10
10
  cert_chain: []
11
- date: 2020-08-28 00:00:00.000000000 Z
11
+ date: 2021-04-16 00:00:00.000000000 Z
12
12
  dependencies:
13
13
  - !ruby/object:Gem::Dependency
14
14
  name: predicate
15
15
  requirement: !ruby/object:Gem::Requirement
16
16
  requirements:
17
- - - "~>"
18
- - !ruby/object:Gem::Version
19
- version: '2.4'
20
17
  - - ">="
21
18
  - !ruby/object:Gem::Version
22
- version: 2.4.0
19
+ version: 2.5.0
20
+ - - "~>"
21
+ - !ruby/object:Gem::Version
22
+ version: '2.5'
23
23
  type: :runtime
24
24
  prerelease: false
25
25
  version_requirements: !ruby/object:Gem::Requirement
26
26
  requirements:
27
- - - "~>"
28
- - !ruby/object:Gem::Version
29
- version: '2.4'
30
27
  - - ">="
31
28
  - !ruby/object:Gem::Version
32
- version: 2.4.0
29
+ version: 2.5.0
30
+ - - "~>"
31
+ - !ruby/object:Gem::Version
32
+ version: '2.5'
33
33
  - !ruby/object:Gem::Dependency
34
34
  name: path
35
35
  requirement: !ruby/object:Gem::Requirement
36
36
  requirements:
37
37
  - - ">="
38
38
  - !ruby/object:Gem::Version
39
- version: '1.3'
39
+ version: '2.0'
40
40
  type: :runtime
41
41
  prerelease: false
42
42
  version_requirements: !ruby/object:Gem::Requirement
43
43
  requirements:
44
44
  - - ">="
45
45
  - !ruby/object:Gem::Version
46
- version: '1.3'
46
+ version: '2.0'
47
47
  - !ruby/object:Gem::Dependency
48
48
  name: rake
49
49
  requirement: !ruby/object:Gem::Requirement
@@ -78,14 +78,14 @@ dependencies:
78
78
  requirements:
79
79
  - - ">="
80
80
  - !ruby/object:Gem::Version
81
- version: '2.7'
81
+ version: '2.8'
82
82
  type: :development
83
83
  prerelease: false
84
84
  version_requirements: !ruby/object:Gem::Requirement
85
85
  requirements:
86
86
  - - ">="
87
87
  - !ruby/object:Gem::Version
88
- version: '2.7'
88
+ version: '2.8'
89
89
  - !ruby/object:Gem::Dependency
90
90
  name: sequel
91
91
  requirement: !ruby/object:Gem::Requirement
@@ -154,6 +154,7 @@ files:
154
154
  - lib/bmg/reader.rb
155
155
  - lib/bmg/reader/csv.rb
156
156
  - lib/bmg/reader/excel.rb
157
+ - lib/bmg/reader/text_file.rb
157
158
  - lib/bmg/relation.rb
158
159
  - lib/bmg/relation/empty.rb
159
160
  - lib/bmg/relation/in_memory.rb
@@ -260,6 +261,8 @@ files:
260
261
  - lib/bmg/summarizer/variance.rb
261
262
  - lib/bmg/support.rb
262
263
  - lib/bmg/support/keys.rb
264
+ - lib/bmg/support/ordering.rb
265
+ - lib/bmg/support/output_preferences.rb
263
266
  - lib/bmg/support/tuple_algebra.rb
264
267
  - lib/bmg/support/tuple_transformer.rb
265
268
  - lib/bmg/type.rb
@@ -287,7 +290,7 @@ required_rubygems_version: !ruby/object:Gem::Requirement
287
290
  - !ruby/object:Gem::Version
288
291
  version: '0'
289
292
  requirements: []
290
- rubygems_version: 3.1.2
293
+ rubygems_version: 3.0.8
291
294
  signing_key:
292
295
  specification_version: 4
293
296
  summary: Bmg is Alf's successor.