bmg 0.22.0 → 0.23.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- checksums.yaml +4 -4
- data/README.md +155 -16
- data/lib/bmg/database/data_folder.rb +67 -0
- data/lib/bmg/database/sequel.rb +35 -0
- data/lib/bmg/database/xlsx.rb +41 -0
- data/lib/bmg/database.rb +35 -0
- data/lib/bmg/error.rb +3 -0
- data/lib/bmg/reader/excel.rb +1 -80
- data/lib/bmg/reader/xlsx.rb +80 -0
- data/lib/bmg/sequel.rb +1 -0
- data/lib/bmg/summarizer/bucketize.rb +82 -0
- data/lib/bmg/summarizer/concat.rb +1 -1
- data/lib/bmg/summarizer.rb +1 -0
- data/lib/bmg/version.rb +1 -1
- data/lib/bmg/writer/xlsx.rb +15 -1
- data/lib/bmg/xlsx.rb +3 -0
- data/lib/bmg.rb +12 -0
- metadata +9 -2
checksums.yaml
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
---
|
2
2
|
SHA1:
|
3
|
-
metadata.gz:
|
4
|
-
data.tar.gz:
|
3
|
+
metadata.gz: 542c4218634ee7ae5b224400ee07ec6d2998473a
|
4
|
+
data.tar.gz: 37eddfc05f9fcfca90e96c50fc6628a09cdf8742
|
5
5
|
SHA512:
|
6
|
-
metadata.gz:
|
7
|
-
data.tar.gz:
|
6
|
+
metadata.gz: 9c5009a14fa10be21fc6d76e23fb47a8e0e69e9dd300478822c6bc9758230ff593cb0a3783b94b0a64c4afbd82456b362ee081a02e4a049c65759c9699a3f074
|
7
|
+
data.tar.gz: 5d73d4dad1eb72f3794cbcfddab422e01a197d9c128f6991243cf7d381e41f9ae0cf255f03aff5eb23ac3d4375de1da1345503e086ff01253dd6e636497727b4
|
data/README.md
CHANGED
@@ -1,6 +1,6 @@
|
|
1
|
-
# Bmg, a relational algebra
|
1
|
+
# Bmg, a relational algebra
|
2
2
|
|
3
|
-
Bmg is a relational algebra implemented as a Ruby library. It implements the
|
3
|
+
Bmg is a [relational algebra](https://www.relational-algebra.dev/) implemented as a Ruby library. It implements the
|
4
4
|
[Relation as First-Class Citizen](http://www.try-alf.org/blog/2013-10-21-relations-as-first-class-citizen)
|
5
5
|
paradigm contributed with [Alf](http://www.try-alf.org/) a few years ago.
|
6
6
|
|
@@ -9,16 +9,24 @@ and any data source that can be seen as serving relations. Cross data-sources
|
|
9
9
|
joins are supported, as with Alf. For differences with Alf, see a section
|
10
10
|
further down this README.
|
11
11
|
|
12
|
+
## Links
|
13
|
+
|
14
|
+
* Documentation can be found at https://www.relational-algebra.dev/
|
15
|
+
* Contribute to that documentation on github: https://github.com/enspirit/bmg-website
|
16
|
+
|
12
17
|
## Outline
|
13
18
|
|
14
19
|
* [Example](#example)
|
15
20
|
* [Where are base relations coming from?](#where-are-base-relations-coming-from)
|
16
21
|
* [Memory relations](#memory-relations)
|
17
22
|
* [Connecting to SQL databases](#connecting-to-sql-databases)
|
18
|
-
* [Reading files
|
23
|
+
* [Reading data files](#reading-data-files-json-csv-yaml-text-xls--xlsx)
|
19
24
|
* [Connecting to Redis databases](#connecting-to-redis-databases)
|
20
25
|
* [Your own relations](#your-own-relations)
|
26
|
+
* [The Database abstraction](#the-database-abstraction)
|
21
27
|
* [List of supported operators](#supported-operators)
|
28
|
+
* [List of supported predicates](#supported-predicates)
|
29
|
+
* [List of supported summaries](#supported-summaries)
|
22
30
|
* [How is this different?](#how-is-this-different)
|
23
31
|
* [... from similar libraries](#-from-similar-libraries)
|
24
32
|
* [... from Alf](#-from-alf)
|
@@ -117,33 +125,38 @@ Bmg.sequel(:suppliers, sequel_db)
|
|
117
125
|
# {:array=>false})
|
118
126
|
```
|
119
127
|
|
120
|
-
### Reading files (csv,
|
128
|
+
### Reading data files (json, csv, yaml, text, xls & xlsx)
|
121
129
|
|
122
130
|
Bmg provides simple adapters to read files and reach Relationland as soon as
|
123
131
|
possible.
|
124
132
|
|
125
|
-
####
|
133
|
+
#### JSON files
|
126
134
|
|
127
135
|
```ruby
|
128
|
-
|
129
|
-
r = Bmg.csv("path/to/a/file.csv", csv_options)
|
136
|
+
r = Bmg.json("path/to/a/file.json")
|
130
137
|
```
|
131
138
|
|
132
|
-
|
133
|
-
library.
|
139
|
+
The json file is expected to contain tuples of same heading.
|
134
140
|
|
135
|
-
####
|
141
|
+
#### YAML files
|
136
142
|
|
137
|
-
|
138
|
-
|
143
|
+
```ruby
|
144
|
+
r = Bmg.yaml("path/to/a/file.yaml")
|
145
|
+
```
|
146
|
+
|
147
|
+
The yaml file is expected to contain tuples of same heading.
|
148
|
+
|
149
|
+
#### CSV files
|
139
150
|
|
140
151
|
```ruby
|
141
|
-
|
142
|
-
r = Bmg.
|
152
|
+
csv_options = { col_sep: ",", quote_char: '"' }
|
153
|
+
r = Bmg.csv("path/to/a/file.csv", csv_options)
|
143
154
|
```
|
144
155
|
|
145
|
-
Options are directly transmitted to
|
146
|
-
|
156
|
+
Options are directly transmitted to `::CSV.new`, check Ruby's standard
|
157
|
+
library. If you don't provide them, `Bmg` uses `headers: true` (hence making
|
158
|
+
then assumption that attributes names are provided on first line), and makes a
|
159
|
+
best effort to infer the column separator.
|
147
160
|
|
148
161
|
#### Text files
|
149
162
|
|
@@ -173,6 +186,19 @@ r.type.attrlist
|
|
173
186
|
In this scenario, non matching lines are skipped. The `:line` attribute keeps
|
174
187
|
being used to have at least one candidate key (so to speak).
|
175
188
|
|
189
|
+
#### Excel files
|
190
|
+
|
191
|
+
You will need to add [`roo`](https://github.com/roo-rb/roo) to your Gemfile to
|
192
|
+
read `.xls` and `.xlsx` files with Bmg.
|
193
|
+
|
194
|
+
```ruby
|
195
|
+
roo_options = { skip: 1 }
|
196
|
+
r = Bmg.excel("path/to/a/file.xls", roo_options)
|
197
|
+
```
|
198
|
+
|
199
|
+
Options are directly transmitted to `Roo::Spreadsheet.open`, check roo's
|
200
|
+
documentation.
|
201
|
+
|
176
202
|
### Connecting to Redis databases
|
177
203
|
|
178
204
|
Bmg currently requires `bmg-redis` and `redis >= 4.6` to connect
|
@@ -240,6 +266,58 @@ restrictions down the tree) by overriding the underscored version of operators
|
|
240
266
|
Have a look at `Bmg::Algebra` for the protocol and `Bmg::Sql::Relation` for an
|
241
267
|
example. Keep in touch with the team if you need some help.
|
242
268
|
|
269
|
+
## The Database abstraction
|
270
|
+
|
271
|
+
The previous section focused on obtaining *relations*. In practice you frequently
|
272
|
+
have a collection of relations hence a *database*:
|
273
|
+
|
274
|
+
* A SQL database with multiple tables
|
275
|
+
* A list of data files, all in the same folder
|
276
|
+
* An excel file with various sheets
|
277
|
+
|
278
|
+
Bmg supports a simple Datbabase abstraction that serves those relations "by name",
|
279
|
+
in a simple way. A database can also be easily dumped back to a data folder of
|
280
|
+
json or csv files, or as simple xlsx files with multiple sheets.
|
281
|
+
|
282
|
+
### Connecting to a SQL Database
|
283
|
+
|
284
|
+
For a SQL database, connected with Sequel:
|
285
|
+
|
286
|
+
```
|
287
|
+
db = Bmg::Database.sequel(Sequel.connect('...'))
|
288
|
+
db.suppliers # yields a Bmg::Relation over the `suppliers` table
|
289
|
+
```
|
290
|
+
|
291
|
+
### Connecting to data files in the same folder
|
292
|
+
|
293
|
+
Data files all in the same folder can be seen as a very basic form of database,
|
294
|
+
and served as such. Bmg supports `json`, `csv` and `yaml` files:
|
295
|
+
|
296
|
+
```
|
297
|
+
db = Bmg::Database.data_folder('./my-database')
|
298
|
+
db.suppliers # yields a Bmg::Relation over the `suppliers.(json,csv,yml)` file
|
299
|
+
```
|
300
|
+
|
301
|
+
Bmg supports files in different formats in the same folder. When files with the
|
302
|
+
same basename exist, json is prefered over yaml, which is prefered over csv.
|
303
|
+
|
304
|
+
### Dumping a Database instance
|
305
|
+
|
306
|
+
As a data folder:
|
307
|
+
|
308
|
+
```
|
309
|
+
db = Bmg::Database.sequel(Sequel.connect('...'))
|
310
|
+
db.to_data_folder('path/to/folder', :json)
|
311
|
+
```
|
312
|
+
|
313
|
+
As an .xlsx file (any existing file will be erased, we don't support modifying
|
314
|
+
existing files):
|
315
|
+
|
316
|
+
```
|
317
|
+
require 'bmg/xlsx'
|
318
|
+
db.to_xlsx('path/to/file.xlsx')
|
319
|
+
```
|
320
|
+
|
243
321
|
## Supported operators
|
244
322
|
|
245
323
|
```ruby
|
@@ -283,6 +361,67 @@ r.unwrap(:a) # shortcut over unwrap([:a])
|
|
283
361
|
r.where(predicate) # alias for restrict(predicate)
|
284
362
|
```
|
285
363
|
|
364
|
+
## Supported Predicates
|
365
|
+
|
366
|
+
Usual operators are supported and map to their SQL equivalent as expected:
|
367
|
+
|
368
|
+
```ruby
|
369
|
+
Predicate.eq # =
|
370
|
+
Predicate.neq # <>
|
371
|
+
Predicate.lt # <
|
372
|
+
Predicate.lte # <=
|
373
|
+
Predicate.gt # >
|
374
|
+
Predicate.gte # >=
|
375
|
+
Predicate.in # SQL's IN
|
376
|
+
Predicate.is_null # SQL's IS NULL
|
377
|
+
```
|
378
|
+
|
379
|
+
See the [Predicate gem](https://github.com/enspirit/predicate) for a more
|
380
|
+
complete list.
|
381
|
+
|
382
|
+
Note: predicates that implement specific Ruby algorithms or patterns are
|
383
|
+
not compiled to SQL (and more generally not delegated to underlying database
|
384
|
+
servers).
|
385
|
+
|
386
|
+
## Supported Summaries
|
387
|
+
|
388
|
+
The `summarize` operator receives a list of `attr: summarizer` pairs, e.g.
|
389
|
+
|
390
|
+
```ruby
|
391
|
+
r.summarize([:city], {
|
392
|
+
how_many: :count, # same as how_many: Bmg::Summarizer.count
|
393
|
+
status: :max, # same as status: Bmg::Summarizer.max(:status)
|
394
|
+
min_status: Bmg::Summarizer.min(:status)
|
395
|
+
})
|
396
|
+
```
|
397
|
+
|
398
|
+
The following summarizers are available and translated to SQL:
|
399
|
+
|
400
|
+
```ruby
|
401
|
+
Bmg::Summarizer.count # count the number of tuples
|
402
|
+
Bmg::Summarizer.distinct(:a) # collect distinct values (as an array)
|
403
|
+
Bmg::Summarizer.distinct_count(:a) # count of distinct values
|
404
|
+
Bmg::Summarizer.min(:a) # min value for attribute :a
|
405
|
+
Bmg::Summarizer.max(:a) # max value
|
406
|
+
Bmg::Summarizer.sum(:a) # sum :a's values
|
407
|
+
Bmg::Summarizer.avg(:a) # average
|
408
|
+
```
|
409
|
+
|
410
|
+
The following summarizers are implemented in Ruby (they are supported when
|
411
|
+
querying SQL databases, but not compiled to SQL):
|
412
|
+
|
413
|
+
```ruby
|
414
|
+
Bmg::Summarizer.collect(:a) # collect :a's values (as an array)
|
415
|
+
Bmg::Summarizer.concat(:a, opts: { ... }) # concat :a's values (opts, e.g. {between: ','})
|
416
|
+
Bmg::Summarizer.first(:a, order: ...) # smallest seen a:'s value according to a tuple ordering
|
417
|
+
Bmg::Summarizer.last(:a, order: ...) # largest seen a:'s value according to a tuple ordering
|
418
|
+
Bmg::Summarizer.variance(:a) # variance
|
419
|
+
Bmg::Summarizer.stddev(:a) # standard deviation
|
420
|
+
Bmg::Summarizer.percentile(:a, nth) # (continuous) nth percentile
|
421
|
+
Bmg::Summarizer.percentile_disc(:a, nth) # discrete nth percentile
|
422
|
+
Bmg::Summarizer.value_by(:a, :by => :b) # { :b => :a } as a Hash
|
423
|
+
```
|
424
|
+
|
286
425
|
## How is this different?
|
287
426
|
|
288
427
|
### ... from similar libraries?
|
@@ -0,0 +1,67 @@
|
|
1
|
+
module Bmg
|
2
|
+
class Database
|
3
|
+
class DataFolder < Database
|
4
|
+
|
5
|
+
DEFAULT_OPTIONS = {
|
6
|
+
data_extensions: ['json', 'yml', 'yaml', 'csv']
|
7
|
+
}
|
8
|
+
|
9
|
+
def initialize(folder, options = {})
|
10
|
+
@folder = Path(folder)
|
11
|
+
@options = DEFAULT_OPTIONS.merge(options)
|
12
|
+
end
|
13
|
+
|
14
|
+
def method_missing(name, *args, &bl)
|
15
|
+
return super(name, *args, &bl) unless args.empty? && bl.nil?
|
16
|
+
raise NotSuchRelationError(name.to_s) unless file = find_file(name)
|
17
|
+
read_file(file)
|
18
|
+
end
|
19
|
+
|
20
|
+
def each_relation_pair
|
21
|
+
return to_enum(:each_relation_pair) unless block_given?
|
22
|
+
|
23
|
+
@folder.glob('*') do |path|
|
24
|
+
next unless path.file?
|
25
|
+
next unless @options[:data_extensions].find {|ext|
|
26
|
+
path.ext == ".#{ext}" || path.ext == ext
|
27
|
+
}
|
28
|
+
yield(path.basename.rm_ext.to_sym, read_file(path))
|
29
|
+
end
|
30
|
+
end
|
31
|
+
|
32
|
+
def self.dump(database, path, ext = :json)
|
33
|
+
path = Path(path)
|
34
|
+
path.mkdir_p
|
35
|
+
database.each_relation_pair do |name, rel|
|
36
|
+
(path/"#{name}.#{ext}").write(rel.public_send(:"to_#{ext}"))
|
37
|
+
end
|
38
|
+
path
|
39
|
+
end
|
40
|
+
|
41
|
+
private
|
42
|
+
|
43
|
+
def read_file(file)
|
44
|
+
case file.ext.to_s
|
45
|
+
when '.json'
|
46
|
+
Bmg.json(file)
|
47
|
+
when '.yaml', '.yml'
|
48
|
+
Bmg.yaml(file)
|
49
|
+
when '.csv'
|
50
|
+
Bmg.csv(file)
|
51
|
+
else
|
52
|
+
raise NotSupportedError, "Unable to use #{file} as a relation"
|
53
|
+
end
|
54
|
+
end
|
55
|
+
|
56
|
+
def find_file(name)
|
57
|
+
exts = @options[:data_extensions]
|
58
|
+
exts.each do |ext|
|
59
|
+
target = @folder/"#{name}.#{ext}"
|
60
|
+
return target if target.file?
|
61
|
+
end
|
62
|
+
raise NotSuchRelationError, "#{@folder}/#{name}.#{exts.join(',')}"
|
63
|
+
end
|
64
|
+
|
65
|
+
end # class DataFolder
|
66
|
+
end # class Database
|
67
|
+
end # module Bmg
|
@@ -0,0 +1,35 @@
|
|
1
|
+
module Bmg
|
2
|
+
class Database
|
3
|
+
class Sequel < Database
|
4
|
+
|
5
|
+
DEFAULT_OPTIONS = {
|
6
|
+
}
|
7
|
+
|
8
|
+
def initialize(sequel_db, options = {})
|
9
|
+
@sequel_db = sequel_db
|
10
|
+
@sequel_db = ::Sequel.connect(@sequel_db) unless @sequel_db.is_a?(::Sequel::Database)
|
11
|
+
end
|
12
|
+
|
13
|
+
def method_missing(name, *args, &bl)
|
14
|
+
return super(name, *args, &bl) unless args.empty? && bl.nil?
|
15
|
+
raise NotSuchRelationError(name.to_s) unless @sequel_db.table_exists?(name)
|
16
|
+
rel_for(name)
|
17
|
+
end
|
18
|
+
|
19
|
+
def each_relation_pair
|
20
|
+
return to_enum(:each_relation_pair) unless block_given?
|
21
|
+
|
22
|
+
@sequel_db.tables.each do |table|
|
23
|
+
yield(table, rel_for(table))
|
24
|
+
end
|
25
|
+
end
|
26
|
+
|
27
|
+
protected
|
28
|
+
|
29
|
+
def rel_for(table_name)
|
30
|
+
Bmg.sequel(table_name, @sequel_db)
|
31
|
+
end
|
32
|
+
|
33
|
+
end # class Sequel
|
34
|
+
end # class Database
|
35
|
+
end # module Bmg
|
@@ -0,0 +1,41 @@
|
|
1
|
+
module Bmg
|
2
|
+
class Database
|
3
|
+
class Xlsx < Database
|
4
|
+
|
5
|
+
DEFAULT_OPTIONS = {
|
6
|
+
}
|
7
|
+
|
8
|
+
def initialize(path, options = {})
|
9
|
+
path = Path(path) if path.is_a?(String)
|
10
|
+
@path = path
|
11
|
+
@options = options.merge(DEFAULT_OPTIONS)
|
12
|
+
end
|
13
|
+
|
14
|
+
def method_missing(name, *args, &bl)
|
15
|
+
return super(name, *args, &bl) unless args.empty? && bl.nil?
|
16
|
+
rel = rel_for(name)
|
17
|
+
raise NotSuchRelationError(name.to_s) unless rel
|
18
|
+
rel
|
19
|
+
end
|
20
|
+
|
21
|
+
def each_relation_pair
|
22
|
+
return to_enum(:each_relation_pair) unless block_given?
|
23
|
+
|
24
|
+
spreadsheet.sheets.each do |sheet_name|
|
25
|
+
yield(sheet_name.to_sym, rel_for(sheet_name))
|
26
|
+
end
|
27
|
+
end
|
28
|
+
|
29
|
+
protected
|
30
|
+
|
31
|
+
def spreadsheet
|
32
|
+
@spreadsheet ||= Roo::Spreadsheet.open(@path, @options)
|
33
|
+
end
|
34
|
+
|
35
|
+
def rel_for(sheet_name)
|
36
|
+
Bmg.excel(@path, { sheet: sheet_name.to_s })
|
37
|
+
end
|
38
|
+
|
39
|
+
end # class Sequel
|
40
|
+
end # class Database
|
41
|
+
end # module Bmg
|
data/lib/bmg/database.rb
ADDED
@@ -0,0 +1,35 @@
|
|
1
|
+
module Bmg
|
2
|
+
class Database
|
3
|
+
|
4
|
+
def self.data_folder(*args)
|
5
|
+
require_relative 'database/data_folder'
|
6
|
+
DataFolder.new(*args)
|
7
|
+
end
|
8
|
+
|
9
|
+
def self.sequel(*args)
|
10
|
+
require 'bmg/sequel'
|
11
|
+
require_relative 'database/sequel'
|
12
|
+
Sequel.new(*args)
|
13
|
+
end
|
14
|
+
|
15
|
+
def self.xlsx(*args)
|
16
|
+
require 'bmg/xlsx'
|
17
|
+
require_relative 'database/xlsx'
|
18
|
+
Xlsx.new(*args)
|
19
|
+
end
|
20
|
+
|
21
|
+
def to_xlsx(*args)
|
22
|
+
require 'bmg/xlsx'
|
23
|
+
Writer::Xlsx.to_xlsx(self, *args)
|
24
|
+
end
|
25
|
+
|
26
|
+
def to_data_folder(*args)
|
27
|
+
DataFolder.dump(self, *args)
|
28
|
+
end
|
29
|
+
|
30
|
+
def each_relation_pair
|
31
|
+
raise NotImplementedError
|
32
|
+
end
|
33
|
+
|
34
|
+
end # class Database
|
35
|
+
end # module Bmg
|
data/lib/bmg/error.rb
CHANGED
data/lib/bmg/reader/excel.rb
CHANGED
@@ -1,80 +1 @@
|
|
1
|
-
|
2
|
-
module Reader
|
3
|
-
class Excel
|
4
|
-
include Reader
|
5
|
-
|
6
|
-
DEFAULT_OPTIONS = {
|
7
|
-
sheet: 0,
|
8
|
-
skip: 0,
|
9
|
-
row_num: true
|
10
|
-
}
|
11
|
-
|
12
|
-
def initialize(type, path, options = {})
|
13
|
-
require 'roo'
|
14
|
-
@path = path
|
15
|
-
@options = DEFAULT_OPTIONS.merge(options)
|
16
|
-
@type = type.knows_attrlist? ? type : type.with_attrlist(infer_attrlist)
|
17
|
-
end
|
18
|
-
|
19
|
-
def each
|
20
|
-
return to_enum unless block_given?
|
21
|
-
|
22
|
-
headers = type.attrlist
|
23
|
-
headers = headers[1..-1] if generate_row_num?
|
24
|
-
start_at = @options[:skip] + 2
|
25
|
-
end_at = spreadsheet.last_row
|
26
|
-
(start_at..end_at).each do |i|
|
27
|
-
row = spreadsheet.row(i)
|
28
|
-
init = init_tuple(i - start_at + 1)
|
29
|
-
tuple = (0...headers.size).each_with_object(init){|i,t|
|
30
|
-
t[headers[i]] = row[i]
|
31
|
-
}
|
32
|
-
yield(tuple)
|
33
|
-
end
|
34
|
-
end
|
35
|
-
|
36
|
-
def to_ast
|
37
|
-
[ :excel, @path, @options ]
|
38
|
-
end
|
39
|
-
|
40
|
-
def to_s
|
41
|
-
"(excel #{@path})"
|
42
|
-
end
|
43
|
-
alias :inspect :to_s
|
44
|
-
|
45
|
-
private
|
46
|
-
|
47
|
-
def spreadsheet
|
48
|
-
@spreadsheet ||= Roo::Spreadsheet
|
49
|
-
.open(@path, @options)
|
50
|
-
.sheet(@options[:sheet])
|
51
|
-
end
|
52
|
-
|
53
|
-
def infer_attrlist
|
54
|
-
row = spreadsheet.row(1+@options[:skip])
|
55
|
-
attrlist = row.map{|c| c.to_s.strip.to_sym }
|
56
|
-
attrlist.unshift(row_num_name) if generate_row_num?
|
57
|
-
attrlist
|
58
|
-
end
|
59
|
-
|
60
|
-
def generate_row_num?
|
61
|
-
!!@options[:row_num]
|
62
|
-
end
|
63
|
-
|
64
|
-
def row_num_name
|
65
|
-
case as = @options[:row_num]
|
66
|
-
when TrueClass then :row_num
|
67
|
-
when Symbol then as
|
68
|
-
else nil
|
69
|
-
end
|
70
|
-
end
|
71
|
-
|
72
|
-
def init_tuple(i)
|
73
|
-
return {} unless generate_row_num?
|
74
|
-
|
75
|
-
{ row_num_name => i }
|
76
|
-
end
|
77
|
-
|
78
|
-
end # class Excel
|
79
|
-
end # module Reader
|
80
|
-
end # module Bmg
|
1
|
+
require_relative 'xlsx'
|
@@ -0,0 +1,80 @@
|
|
1
|
+
module Bmg
|
2
|
+
module Reader
|
3
|
+
class Excel
|
4
|
+
include Reader
|
5
|
+
|
6
|
+
DEFAULT_OPTIONS = {
|
7
|
+
sheet: 0,
|
8
|
+
skip: 0,
|
9
|
+
row_num: true
|
10
|
+
}
|
11
|
+
|
12
|
+
def initialize(type, path, options = {})
|
13
|
+
require 'roo'
|
14
|
+
@path = path
|
15
|
+
@options = DEFAULT_OPTIONS.merge(options)
|
16
|
+
@type = type.knows_attrlist? ? type : type.with_attrlist(infer_attrlist)
|
17
|
+
end
|
18
|
+
|
19
|
+
def each
|
20
|
+
return to_enum unless block_given?
|
21
|
+
|
22
|
+
headers = type.attrlist
|
23
|
+
headers = headers[1..-1] if generate_row_num?
|
24
|
+
start_at = @options[:skip] + 2
|
25
|
+
end_at = spreadsheet.last_row
|
26
|
+
(start_at..end_at).each do |i|
|
27
|
+
row = spreadsheet.row(i)
|
28
|
+
init = init_tuple(i - start_at + 1)
|
29
|
+
tuple = (0...headers.size).each_with_object(init){|i,t|
|
30
|
+
t[headers[i]] = row[i]
|
31
|
+
}
|
32
|
+
yield(tuple)
|
33
|
+
end
|
34
|
+
end
|
35
|
+
|
36
|
+
def to_ast
|
37
|
+
[ :excel, @path, @options ]
|
38
|
+
end
|
39
|
+
|
40
|
+
def to_s
|
41
|
+
"(excel #{@path})"
|
42
|
+
end
|
43
|
+
alias :inspect :to_s
|
44
|
+
|
45
|
+
private
|
46
|
+
|
47
|
+
def spreadsheet
|
48
|
+
@spreadsheet ||= Roo::Spreadsheet
|
49
|
+
.open(@path, @options)
|
50
|
+
.sheet(@options[:sheet])
|
51
|
+
end
|
52
|
+
|
53
|
+
def infer_attrlist
|
54
|
+
row = spreadsheet.row(1+@options[:skip])
|
55
|
+
attrlist = row.map{|c| c.to_s.strip.to_sym }
|
56
|
+
attrlist.unshift(row_num_name) if generate_row_num?
|
57
|
+
attrlist
|
58
|
+
end
|
59
|
+
|
60
|
+
def generate_row_num?
|
61
|
+
!!@options[:row_num]
|
62
|
+
end
|
63
|
+
|
64
|
+
def row_num_name
|
65
|
+
case as = @options[:row_num]
|
66
|
+
when TrueClass then :row_num
|
67
|
+
when Symbol then as
|
68
|
+
else nil
|
69
|
+
end
|
70
|
+
end
|
71
|
+
|
72
|
+
def init_tuple(i)
|
73
|
+
return {} unless generate_row_num?
|
74
|
+
|
75
|
+
{ row_num_name => i }
|
76
|
+
end
|
77
|
+
|
78
|
+
end # class Excel
|
79
|
+
end # module Reader
|
80
|
+
end # module Bmg
|
data/lib/bmg/sequel.rb
CHANGED
@@ -0,0 +1,82 @@
|
|
1
|
+
module Bmg
|
2
|
+
class Summarizer
|
3
|
+
#
|
4
|
+
# Bucketizer summarizer.
|
5
|
+
#
|
6
|
+
# Example:
|
7
|
+
#
|
8
|
+
# # direct ruby usage
|
9
|
+
# Bmg::Summarizer.bucketize(:qty, :size => 2).summarize(...)
|
10
|
+
#
|
11
|
+
class Bucketize < Summarizer
|
12
|
+
|
13
|
+
# Sets default options.
|
14
|
+
def default_options
|
15
|
+
{ :size => 10 }
|
16
|
+
end
|
17
|
+
|
18
|
+
# Returns least value (defaults to "")
|
19
|
+
def least()
|
20
|
+
[[], []]
|
21
|
+
end
|
22
|
+
|
23
|
+
# Concatenates current memo with val.to_s
|
24
|
+
def _happens(memo, val)
|
25
|
+
memo.first << val
|
26
|
+
memo
|
27
|
+
end
|
28
|
+
|
29
|
+
# Finalizes computation
|
30
|
+
def finalize(memo)
|
31
|
+
buckets = compute_buckets(memo.first, options[:size])
|
32
|
+
buckets = touching_buckets(buckets) if options[:boundaries] == :touching
|
33
|
+
buckets
|
34
|
+
end
|
35
|
+
|
36
|
+
private
|
37
|
+
|
38
|
+
def compute_buckets(values, num_buckets = 10)
|
39
|
+
sorted_values = values.compact.sort
|
40
|
+
sorted_values = sorted_values.map{|v| v.to_s[0...options[:value_length]] } if options[:value_length]
|
41
|
+
sorted_values = sorted_values.uniq if options[:distinct]
|
42
|
+
|
43
|
+
# Calculate the size of each bucket
|
44
|
+
total_values = sorted_values.length
|
45
|
+
bucket_size = (total_values / num_buckets.to_f).ceil
|
46
|
+
|
47
|
+
# Create the ranges for each bucket
|
48
|
+
bucket_ranges = []
|
49
|
+
num_buckets.times do |i|
|
50
|
+
start_index = i * bucket_size
|
51
|
+
break if start_index >= total_values # Ensure we do not exceed the array bounds
|
52
|
+
|
53
|
+
end_index = [(start_index + bucket_size - 1), total_values - 1].min
|
54
|
+
start_value = sorted_values[start_index]
|
55
|
+
end_value = sorted_values[end_index]
|
56
|
+
bucket_ranges << (start_value..end_value)
|
57
|
+
end
|
58
|
+
|
59
|
+
bucket_ranges
|
60
|
+
end
|
61
|
+
|
62
|
+
def touching_buckets(buckets)
|
63
|
+
result = []
|
64
|
+
buckets.each do |b|
|
65
|
+
r_start = result.empty? ? b.begin : result.last.end
|
66
|
+
r_end = b.end
|
67
|
+
result << (r_start...r_end)
|
68
|
+
end
|
69
|
+
result[-1] = (result.last.begin..result.last.end)
|
70
|
+
|
71
|
+
result
|
72
|
+
end
|
73
|
+
|
74
|
+
end # class Concat
|
75
|
+
|
76
|
+
# Factors a bucketize summarizer
|
77
|
+
def self.bucketize(*args, &bl)
|
78
|
+
Bucketize.new(*args, &bl)
|
79
|
+
end
|
80
|
+
|
81
|
+
end # class Summarizer
|
82
|
+
end # module Bmg
|
data/lib/bmg/summarizer.rb
CHANGED
data/lib/bmg/version.rb
CHANGED
data/lib/bmg/writer/xlsx.rb
CHANGED
@@ -7,22 +7,36 @@ module Bmg
|
|
7
7
|
}
|
8
8
|
|
9
9
|
def initialize(xlsx_options, output_preferences = nil)
|
10
|
+
require 'write_xlsx'
|
10
11
|
@xlsx_options = DEFAULT_OPTIONS.merge(xlsx_options)
|
11
12
|
@output_preferences = OutputPreferences.dress(output_preferences)
|
12
13
|
end
|
13
14
|
attr_reader :xlsx_options, :output_preferences
|
14
15
|
|
15
16
|
def call(relation, path)
|
16
|
-
require 'write_xlsx'
|
17
17
|
dup._call(relation, path)
|
18
18
|
end
|
19
19
|
|
20
|
+
def self.to_xlsx(database, path)
|
21
|
+
require 'write_xlsx'
|
22
|
+
workbook = WriteXLSX.new(path)
|
23
|
+
database.each_relation_pair do |name, rel|
|
24
|
+
worksheet = workbook.add_worksheet(name)
|
25
|
+
rel.to_xlsx({
|
26
|
+
workbook: workbook,
|
27
|
+
worksheet: worksheet,
|
28
|
+
})
|
29
|
+
end
|
30
|
+
workbook.close
|
31
|
+
end
|
32
|
+
|
20
33
|
protected
|
21
34
|
attr_reader :workbook, :worksheet
|
22
35
|
|
23
36
|
def _call(relation, path)
|
24
37
|
@workbook = xlsx_options[:workbook] || WriteXLSX.new(path)
|
25
38
|
@worksheet = xlsx_options[:worksheet] || workbook.add_worksheet
|
39
|
+
@worksheet = workbook.add_worksheet(@worksheet) if @worksheet.is_a?(String)
|
26
40
|
|
27
41
|
headers = infer_headers(relation.type)
|
28
42
|
before = nil
|
data/lib/bmg/xlsx.rb
ADDED
data/lib/bmg.rb
CHANGED
@@ -24,6 +24,16 @@ module Bmg
|
|
24
24
|
end
|
25
25
|
module_function :csv
|
26
26
|
|
27
|
+
def json(path, options = {}, type = Type::ANY)
|
28
|
+
in_memory(path.load.map{|tuple| TupleAlgebra.symbolize_keys(tuple) })
|
29
|
+
end
|
30
|
+
module_function :json
|
31
|
+
|
32
|
+
def yaml(path, options = {}, type = Type::ANY)
|
33
|
+
in_memory(path.load.map{|tuple| TupleAlgebra.symbolize_keys(tuple) })
|
34
|
+
end
|
35
|
+
module_function :yaml
|
36
|
+
|
27
37
|
def excel(path, options = {}, type = Type::ANY)
|
28
38
|
Reader::Excel.new(type, path, options).spied(main_spy)
|
29
39
|
end
|
@@ -57,6 +67,8 @@ module Bmg
|
|
57
67
|
require_relative 'bmg/relation/materialized'
|
58
68
|
require_relative 'bmg/relation/proxy'
|
59
69
|
|
70
|
+
require_relative 'bmg/database'
|
71
|
+
|
60
72
|
# Deprecated
|
61
73
|
Leaf = Relation::InMemory
|
62
74
|
end
|
metadata
CHANGED
@@ -1,14 +1,14 @@
|
|
1
1
|
--- !ruby/object:Gem::Specification
|
2
2
|
name: bmg
|
3
3
|
version: !ruby/object:Gem::Version
|
4
|
-
version: 0.
|
4
|
+
version: 0.23.0
|
5
5
|
platform: ruby
|
6
6
|
authors:
|
7
7
|
- Bernard Lambeau
|
8
8
|
autorequire:
|
9
9
|
bindir: bin
|
10
10
|
cert_chain: []
|
11
|
-
date: 2024-
|
11
|
+
date: 2024-06-27 00:00:00.000000000 Z
|
12
12
|
dependencies:
|
13
13
|
- !ruby/object:Gem::Dependency
|
14
14
|
name: predicate
|
@@ -142,6 +142,10 @@ files:
|
|
142
142
|
- lib/bmg.rb
|
143
143
|
- lib/bmg/algebra.rb
|
144
144
|
- lib/bmg/algebra/shortcuts.rb
|
145
|
+
- lib/bmg/database.rb
|
146
|
+
- lib/bmg/database/data_folder.rb
|
147
|
+
- lib/bmg/database/sequel.rb
|
148
|
+
- lib/bmg/database/xlsx.rb
|
145
149
|
- lib/bmg/error.rb
|
146
150
|
- lib/bmg/operator.rb
|
147
151
|
- lib/bmg/operator/allbut.rb
|
@@ -172,6 +176,7 @@ files:
|
|
172
176
|
- lib/bmg/reader/csv.rb
|
173
177
|
- lib/bmg/reader/excel.rb
|
174
178
|
- lib/bmg/reader/text_file.rb
|
179
|
+
- lib/bmg/reader/xlsx.rb
|
175
180
|
- lib/bmg/relation.rb
|
176
181
|
- lib/bmg/relation/empty.rb
|
177
182
|
- lib/bmg/relation/in_memory.rb
|
@@ -272,6 +277,7 @@ files:
|
|
272
277
|
- lib/bmg/sql/version.rb
|
273
278
|
- lib/bmg/summarizer.rb
|
274
279
|
- lib/bmg/summarizer/avg.rb
|
280
|
+
- lib/bmg/summarizer/bucketize.rb
|
275
281
|
- lib/bmg/summarizer/by_proc.rb
|
276
282
|
- lib/bmg/summarizer/collect.rb
|
277
283
|
- lib/bmg/summarizer/concat.rb
|
@@ -300,6 +306,7 @@ files:
|
|
300
306
|
- lib/bmg/writer.rb
|
301
307
|
- lib/bmg/writer/csv.rb
|
302
308
|
- lib/bmg/writer/xlsx.rb
|
309
|
+
- lib/bmg/xlsx.rb
|
303
310
|
- tasks/gem.rake
|
304
311
|
- tasks/test.rake
|
305
312
|
homepage: http://github.com/enspirit/bmg
|