bmg 0.22.0 → 0.23.0
Sign up to get free protection for your applications and to get access to all the features.
- checksums.yaml +4 -4
- data/README.md +155 -16
- data/lib/bmg/database/data_folder.rb +67 -0
- data/lib/bmg/database/sequel.rb +35 -0
- data/lib/bmg/database/xlsx.rb +41 -0
- data/lib/bmg/database.rb +35 -0
- data/lib/bmg/error.rb +3 -0
- data/lib/bmg/reader/excel.rb +1 -80
- data/lib/bmg/reader/xlsx.rb +80 -0
- data/lib/bmg/sequel.rb +1 -0
- data/lib/bmg/summarizer/bucketize.rb +82 -0
- data/lib/bmg/summarizer/concat.rb +1 -1
- data/lib/bmg/summarizer.rb +1 -0
- data/lib/bmg/version.rb +1 -1
- data/lib/bmg/writer/xlsx.rb +15 -1
- data/lib/bmg/xlsx.rb +3 -0
- data/lib/bmg.rb +12 -0
- metadata +9 -2
checksums.yaml
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
---
|
2
2
|
SHA1:
|
3
|
-
metadata.gz:
|
4
|
-
data.tar.gz:
|
3
|
+
metadata.gz: 542c4218634ee7ae5b224400ee07ec6d2998473a
|
4
|
+
data.tar.gz: 37eddfc05f9fcfca90e96c50fc6628a09cdf8742
|
5
5
|
SHA512:
|
6
|
-
metadata.gz:
|
7
|
-
data.tar.gz:
|
6
|
+
metadata.gz: 9c5009a14fa10be21fc6d76e23fb47a8e0e69e9dd300478822c6bc9758230ff593cb0a3783b94b0a64c4afbd82456b362ee081a02e4a049c65759c9699a3f074
|
7
|
+
data.tar.gz: 5d73d4dad1eb72f3794cbcfddab422e01a197d9c128f6991243cf7d381e41f9ae0cf255f03aff5eb23ac3d4375de1da1345503e086ff01253dd6e636497727b4
|
data/README.md
CHANGED
@@ -1,6 +1,6 @@
|
|
1
|
-
# Bmg, a relational algebra
|
1
|
+
# Bmg, a relational algebra
|
2
2
|
|
3
|
-
Bmg is a relational algebra implemented as a Ruby library. It implements the
|
3
|
+
Bmg is a [relational algebra](https://www.relational-algebra.dev/) implemented as a Ruby library. It implements the
|
4
4
|
[Relation as First-Class Citizen](http://www.try-alf.org/blog/2013-10-21-relations-as-first-class-citizen)
|
5
5
|
paradigm contributed with [Alf](http://www.try-alf.org/) a few years ago.
|
6
6
|
|
@@ -9,16 +9,24 @@ and any data source that can be seen as serving relations. Cross data-sources
|
|
9
9
|
joins are supported, as with Alf. For differences with Alf, see a section
|
10
10
|
further down this README.
|
11
11
|
|
12
|
+
## Links
|
13
|
+
|
14
|
+
* Documentation can be found at https://www.relational-algebra.dev/
|
15
|
+
* Contribute to that documentation on github: https://github.com/enspirit/bmg-website
|
16
|
+
|
12
17
|
## Outline
|
13
18
|
|
14
19
|
* [Example](#example)
|
15
20
|
* [Where are base relations coming from?](#where-are-base-relations-coming-from)
|
16
21
|
* [Memory relations](#memory-relations)
|
17
22
|
* [Connecting to SQL databases](#connecting-to-sql-databases)
|
18
|
-
* [Reading files
|
23
|
+
* [Reading data files](#reading-data-files-json-csv-yaml-text-xls--xlsx)
|
19
24
|
* [Connecting to Redis databases](#connecting-to-redis-databases)
|
20
25
|
* [Your own relations](#your-own-relations)
|
26
|
+
* [The Database abstraction](#the-database-abstraction)
|
21
27
|
* [List of supported operators](#supported-operators)
|
28
|
+
* [List of supported predicates](#supported-predicates)
|
29
|
+
* [List of supported summaries](#supported-summaries)
|
22
30
|
* [How is this different?](#how-is-this-different)
|
23
31
|
* [... from similar libraries](#-from-similar-libraries)
|
24
32
|
* [... from Alf](#-from-alf)
|
@@ -117,33 +125,38 @@ Bmg.sequel(:suppliers, sequel_db)
|
|
117
125
|
# {:array=>false})
|
118
126
|
```
|
119
127
|
|
120
|
-
### Reading files (csv,
|
128
|
+
### Reading data files (json, csv, yaml, text, xls & xlsx)
|
121
129
|
|
122
130
|
Bmg provides simple adapters to read files and reach Relationland as soon as
|
123
131
|
possible.
|
124
132
|
|
125
|
-
####
|
133
|
+
#### JSON files
|
126
134
|
|
127
135
|
```ruby
|
128
|
-
|
129
|
-
r = Bmg.csv("path/to/a/file.csv", csv_options)
|
136
|
+
r = Bmg.json("path/to/a/file.json")
|
130
137
|
```
|
131
138
|
|
132
|
-
|
133
|
-
library.
|
139
|
+
The json file is expected to contain tuples of same heading.
|
134
140
|
|
135
|
-
####
|
141
|
+
#### YAML files
|
136
142
|
|
137
|
-
|
138
|
-
|
143
|
+
```ruby
|
144
|
+
r = Bmg.yaml("path/to/a/file.yaml")
|
145
|
+
```
|
146
|
+
|
147
|
+
The yaml file is expected to contain tuples of same heading.
|
148
|
+
|
149
|
+
#### CSV files
|
139
150
|
|
140
151
|
```ruby
|
141
|
-
|
142
|
-
r = Bmg.
|
152
|
+
csv_options = { col_sep: ",", quote_char: '"' }
|
153
|
+
r = Bmg.csv("path/to/a/file.csv", csv_options)
|
143
154
|
```
|
144
155
|
|
145
|
-
Options are directly transmitted to
|
146
|
-
|
156
|
+
Options are directly transmitted to `::CSV.new`, check Ruby's standard
|
157
|
+
library. If you don't provide them, `Bmg` uses `headers: true` (hence making
|
158
|
+
then assumption that attributes names are provided on first line), and makes a
|
159
|
+
best effort to infer the column separator.
|
147
160
|
|
148
161
|
#### Text files
|
149
162
|
|
@@ -173,6 +186,19 @@ r.type.attrlist
|
|
173
186
|
In this scenario, non matching lines are skipped. The `:line` attribute keeps
|
174
187
|
being used to have at least one candidate key (so to speak).
|
175
188
|
|
189
|
+
#### Excel files
|
190
|
+
|
191
|
+
You will need to add [`roo`](https://github.com/roo-rb/roo) to your Gemfile to
|
192
|
+
read `.xls` and `.xlsx` files with Bmg.
|
193
|
+
|
194
|
+
```ruby
|
195
|
+
roo_options = { skip: 1 }
|
196
|
+
r = Bmg.excel("path/to/a/file.xls", roo_options)
|
197
|
+
```
|
198
|
+
|
199
|
+
Options are directly transmitted to `Roo::Spreadsheet.open`, check roo's
|
200
|
+
documentation.
|
201
|
+
|
176
202
|
### Connecting to Redis databases
|
177
203
|
|
178
204
|
Bmg currently requires `bmg-redis` and `redis >= 4.6` to connect
|
@@ -240,6 +266,58 @@ restrictions down the tree) by overriding the underscored version of operators
|
|
240
266
|
Have a look at `Bmg::Algebra` for the protocol and `Bmg::Sql::Relation` for an
|
241
267
|
example. Keep in touch with the team if you need some help.
|
242
268
|
|
269
|
+
## The Database abstraction
|
270
|
+
|
271
|
+
The previous section focused on obtaining *relations*. In practice you frequently
|
272
|
+
have a collection of relations hence a *database*:
|
273
|
+
|
274
|
+
* A SQL database with multiple tables
|
275
|
+
* A list of data files, all in the same folder
|
276
|
+
* An excel file with various sheets
|
277
|
+
|
278
|
+
Bmg supports a simple Datbabase abstraction that serves those relations "by name",
|
279
|
+
in a simple way. A database can also be easily dumped back to a data folder of
|
280
|
+
json or csv files, or as simple xlsx files with multiple sheets.
|
281
|
+
|
282
|
+
### Connecting to a SQL Database
|
283
|
+
|
284
|
+
For a SQL database, connected with Sequel:
|
285
|
+
|
286
|
+
```
|
287
|
+
db = Bmg::Database.sequel(Sequel.connect('...'))
|
288
|
+
db.suppliers # yields a Bmg::Relation over the `suppliers` table
|
289
|
+
```
|
290
|
+
|
291
|
+
### Connecting to data files in the same folder
|
292
|
+
|
293
|
+
Data files all in the same folder can be seen as a very basic form of database,
|
294
|
+
and served as such. Bmg supports `json`, `csv` and `yaml` files:
|
295
|
+
|
296
|
+
```
|
297
|
+
db = Bmg::Database.data_folder('./my-database')
|
298
|
+
db.suppliers # yields a Bmg::Relation over the `suppliers.(json,csv,yml)` file
|
299
|
+
```
|
300
|
+
|
301
|
+
Bmg supports files in different formats in the same folder. When files with the
|
302
|
+
same basename exist, json is prefered over yaml, which is prefered over csv.
|
303
|
+
|
304
|
+
### Dumping a Database instance
|
305
|
+
|
306
|
+
As a data folder:
|
307
|
+
|
308
|
+
```
|
309
|
+
db = Bmg::Database.sequel(Sequel.connect('...'))
|
310
|
+
db.to_data_folder('path/to/folder', :json)
|
311
|
+
```
|
312
|
+
|
313
|
+
As an .xlsx file (any existing file will be erased, we don't support modifying
|
314
|
+
existing files):
|
315
|
+
|
316
|
+
```
|
317
|
+
require 'bmg/xlsx'
|
318
|
+
db.to_xlsx('path/to/file.xlsx')
|
319
|
+
```
|
320
|
+
|
243
321
|
## Supported operators
|
244
322
|
|
245
323
|
```ruby
|
@@ -283,6 +361,67 @@ r.unwrap(:a) # shortcut over unwrap([:a])
|
|
283
361
|
r.where(predicate) # alias for restrict(predicate)
|
284
362
|
```
|
285
363
|
|
364
|
+
## Supported Predicates
|
365
|
+
|
366
|
+
Usual operators are supported and map to their SQL equivalent as expected:
|
367
|
+
|
368
|
+
```ruby
|
369
|
+
Predicate.eq # =
|
370
|
+
Predicate.neq # <>
|
371
|
+
Predicate.lt # <
|
372
|
+
Predicate.lte # <=
|
373
|
+
Predicate.gt # >
|
374
|
+
Predicate.gte # >=
|
375
|
+
Predicate.in # SQL's IN
|
376
|
+
Predicate.is_null # SQL's IS NULL
|
377
|
+
```
|
378
|
+
|
379
|
+
See the [Predicate gem](https://github.com/enspirit/predicate) for a more
|
380
|
+
complete list.
|
381
|
+
|
382
|
+
Note: predicates that implement specific Ruby algorithms or patterns are
|
383
|
+
not compiled to SQL (and more generally not delegated to underlying database
|
384
|
+
servers).
|
385
|
+
|
386
|
+
## Supported Summaries
|
387
|
+
|
388
|
+
The `summarize` operator receives a list of `attr: summarizer` pairs, e.g.
|
389
|
+
|
390
|
+
```ruby
|
391
|
+
r.summarize([:city], {
|
392
|
+
how_many: :count, # same as how_many: Bmg::Summarizer.count
|
393
|
+
status: :max, # same as status: Bmg::Summarizer.max(:status)
|
394
|
+
min_status: Bmg::Summarizer.min(:status)
|
395
|
+
})
|
396
|
+
```
|
397
|
+
|
398
|
+
The following summarizers are available and translated to SQL:
|
399
|
+
|
400
|
+
```ruby
|
401
|
+
Bmg::Summarizer.count # count the number of tuples
|
402
|
+
Bmg::Summarizer.distinct(:a) # collect distinct values (as an array)
|
403
|
+
Bmg::Summarizer.distinct_count(:a) # count of distinct values
|
404
|
+
Bmg::Summarizer.min(:a) # min value for attribute :a
|
405
|
+
Bmg::Summarizer.max(:a) # max value
|
406
|
+
Bmg::Summarizer.sum(:a) # sum :a's values
|
407
|
+
Bmg::Summarizer.avg(:a) # average
|
408
|
+
```
|
409
|
+
|
410
|
+
The following summarizers are implemented in Ruby (they are supported when
|
411
|
+
querying SQL databases, but not compiled to SQL):
|
412
|
+
|
413
|
+
```ruby
|
414
|
+
Bmg::Summarizer.collect(:a) # collect :a's values (as an array)
|
415
|
+
Bmg::Summarizer.concat(:a, opts: { ... }) # concat :a's values (opts, e.g. {between: ','})
|
416
|
+
Bmg::Summarizer.first(:a, order: ...) # smallest seen a:'s value according to a tuple ordering
|
417
|
+
Bmg::Summarizer.last(:a, order: ...) # largest seen a:'s value according to a tuple ordering
|
418
|
+
Bmg::Summarizer.variance(:a) # variance
|
419
|
+
Bmg::Summarizer.stddev(:a) # standard deviation
|
420
|
+
Bmg::Summarizer.percentile(:a, nth) # (continuous) nth percentile
|
421
|
+
Bmg::Summarizer.percentile_disc(:a, nth) # discrete nth percentile
|
422
|
+
Bmg::Summarizer.value_by(:a, :by => :b) # { :b => :a } as a Hash
|
423
|
+
```
|
424
|
+
|
286
425
|
## How is this different?
|
287
426
|
|
288
427
|
### ... from similar libraries?
|
@@ -0,0 +1,67 @@
|
|
1
|
+
module Bmg
|
2
|
+
class Database
|
3
|
+
class DataFolder < Database
|
4
|
+
|
5
|
+
DEFAULT_OPTIONS = {
|
6
|
+
data_extensions: ['json', 'yml', 'yaml', 'csv']
|
7
|
+
}
|
8
|
+
|
9
|
+
def initialize(folder, options = {})
|
10
|
+
@folder = Path(folder)
|
11
|
+
@options = DEFAULT_OPTIONS.merge(options)
|
12
|
+
end
|
13
|
+
|
14
|
+
def method_missing(name, *args, &bl)
|
15
|
+
return super(name, *args, &bl) unless args.empty? && bl.nil?
|
16
|
+
raise NotSuchRelationError(name.to_s) unless file = find_file(name)
|
17
|
+
read_file(file)
|
18
|
+
end
|
19
|
+
|
20
|
+
def each_relation_pair
|
21
|
+
return to_enum(:each_relation_pair) unless block_given?
|
22
|
+
|
23
|
+
@folder.glob('*') do |path|
|
24
|
+
next unless path.file?
|
25
|
+
next unless @options[:data_extensions].find {|ext|
|
26
|
+
path.ext == ".#{ext}" || path.ext == ext
|
27
|
+
}
|
28
|
+
yield(path.basename.rm_ext.to_sym, read_file(path))
|
29
|
+
end
|
30
|
+
end
|
31
|
+
|
32
|
+
def self.dump(database, path, ext = :json)
|
33
|
+
path = Path(path)
|
34
|
+
path.mkdir_p
|
35
|
+
database.each_relation_pair do |name, rel|
|
36
|
+
(path/"#{name}.#{ext}").write(rel.public_send(:"to_#{ext}"))
|
37
|
+
end
|
38
|
+
path
|
39
|
+
end
|
40
|
+
|
41
|
+
private
|
42
|
+
|
43
|
+
def read_file(file)
|
44
|
+
case file.ext.to_s
|
45
|
+
when '.json'
|
46
|
+
Bmg.json(file)
|
47
|
+
when '.yaml', '.yml'
|
48
|
+
Bmg.yaml(file)
|
49
|
+
when '.csv'
|
50
|
+
Bmg.csv(file)
|
51
|
+
else
|
52
|
+
raise NotSupportedError, "Unable to use #{file} as a relation"
|
53
|
+
end
|
54
|
+
end
|
55
|
+
|
56
|
+
def find_file(name)
|
57
|
+
exts = @options[:data_extensions]
|
58
|
+
exts.each do |ext|
|
59
|
+
target = @folder/"#{name}.#{ext}"
|
60
|
+
return target if target.file?
|
61
|
+
end
|
62
|
+
raise NotSuchRelationError, "#{@folder}/#{name}.#{exts.join(',')}"
|
63
|
+
end
|
64
|
+
|
65
|
+
end # class DataFolder
|
66
|
+
end # class Database
|
67
|
+
end # module Bmg
|
@@ -0,0 +1,35 @@
|
|
1
|
+
module Bmg
|
2
|
+
class Database
|
3
|
+
class Sequel < Database
|
4
|
+
|
5
|
+
DEFAULT_OPTIONS = {
|
6
|
+
}
|
7
|
+
|
8
|
+
def initialize(sequel_db, options = {})
|
9
|
+
@sequel_db = sequel_db
|
10
|
+
@sequel_db = ::Sequel.connect(@sequel_db) unless @sequel_db.is_a?(::Sequel::Database)
|
11
|
+
end
|
12
|
+
|
13
|
+
def method_missing(name, *args, &bl)
|
14
|
+
return super(name, *args, &bl) unless args.empty? && bl.nil?
|
15
|
+
raise NotSuchRelationError(name.to_s) unless @sequel_db.table_exists?(name)
|
16
|
+
rel_for(name)
|
17
|
+
end
|
18
|
+
|
19
|
+
def each_relation_pair
|
20
|
+
return to_enum(:each_relation_pair) unless block_given?
|
21
|
+
|
22
|
+
@sequel_db.tables.each do |table|
|
23
|
+
yield(table, rel_for(table))
|
24
|
+
end
|
25
|
+
end
|
26
|
+
|
27
|
+
protected
|
28
|
+
|
29
|
+
def rel_for(table_name)
|
30
|
+
Bmg.sequel(table_name, @sequel_db)
|
31
|
+
end
|
32
|
+
|
33
|
+
end # class Sequel
|
34
|
+
end # class Database
|
35
|
+
end # module Bmg
|
@@ -0,0 +1,41 @@
|
|
1
|
+
module Bmg
|
2
|
+
class Database
|
3
|
+
class Xlsx < Database
|
4
|
+
|
5
|
+
DEFAULT_OPTIONS = {
|
6
|
+
}
|
7
|
+
|
8
|
+
def initialize(path, options = {})
|
9
|
+
path = Path(path) if path.is_a?(String)
|
10
|
+
@path = path
|
11
|
+
@options = options.merge(DEFAULT_OPTIONS)
|
12
|
+
end
|
13
|
+
|
14
|
+
def method_missing(name, *args, &bl)
|
15
|
+
return super(name, *args, &bl) unless args.empty? && bl.nil?
|
16
|
+
rel = rel_for(name)
|
17
|
+
raise NotSuchRelationError(name.to_s) unless rel
|
18
|
+
rel
|
19
|
+
end
|
20
|
+
|
21
|
+
def each_relation_pair
|
22
|
+
return to_enum(:each_relation_pair) unless block_given?
|
23
|
+
|
24
|
+
spreadsheet.sheets.each do |sheet_name|
|
25
|
+
yield(sheet_name.to_sym, rel_for(sheet_name))
|
26
|
+
end
|
27
|
+
end
|
28
|
+
|
29
|
+
protected
|
30
|
+
|
31
|
+
def spreadsheet
|
32
|
+
@spreadsheet ||= Roo::Spreadsheet.open(@path, @options)
|
33
|
+
end
|
34
|
+
|
35
|
+
def rel_for(sheet_name)
|
36
|
+
Bmg.excel(@path, { sheet: sheet_name.to_s })
|
37
|
+
end
|
38
|
+
|
39
|
+
end # class Sequel
|
40
|
+
end # class Database
|
41
|
+
end # module Bmg
|
data/lib/bmg/database.rb
ADDED
@@ -0,0 +1,35 @@
|
|
1
|
+
module Bmg
|
2
|
+
class Database
|
3
|
+
|
4
|
+
def self.data_folder(*args)
|
5
|
+
require_relative 'database/data_folder'
|
6
|
+
DataFolder.new(*args)
|
7
|
+
end
|
8
|
+
|
9
|
+
def self.sequel(*args)
|
10
|
+
require 'bmg/sequel'
|
11
|
+
require_relative 'database/sequel'
|
12
|
+
Sequel.new(*args)
|
13
|
+
end
|
14
|
+
|
15
|
+
def self.xlsx(*args)
|
16
|
+
require 'bmg/xlsx'
|
17
|
+
require_relative 'database/xlsx'
|
18
|
+
Xlsx.new(*args)
|
19
|
+
end
|
20
|
+
|
21
|
+
def to_xlsx(*args)
|
22
|
+
require 'bmg/xlsx'
|
23
|
+
Writer::Xlsx.to_xlsx(self, *args)
|
24
|
+
end
|
25
|
+
|
26
|
+
def to_data_folder(*args)
|
27
|
+
DataFolder.dump(self, *args)
|
28
|
+
end
|
29
|
+
|
30
|
+
def each_relation_pair
|
31
|
+
raise NotImplementedError
|
32
|
+
end
|
33
|
+
|
34
|
+
end # class Database
|
35
|
+
end # module Bmg
|
data/lib/bmg/error.rb
CHANGED
data/lib/bmg/reader/excel.rb
CHANGED
@@ -1,80 +1 @@
|
|
1
|
-
|
2
|
-
module Reader
|
3
|
-
class Excel
|
4
|
-
include Reader
|
5
|
-
|
6
|
-
DEFAULT_OPTIONS = {
|
7
|
-
sheet: 0,
|
8
|
-
skip: 0,
|
9
|
-
row_num: true
|
10
|
-
}
|
11
|
-
|
12
|
-
def initialize(type, path, options = {})
|
13
|
-
require 'roo'
|
14
|
-
@path = path
|
15
|
-
@options = DEFAULT_OPTIONS.merge(options)
|
16
|
-
@type = type.knows_attrlist? ? type : type.with_attrlist(infer_attrlist)
|
17
|
-
end
|
18
|
-
|
19
|
-
def each
|
20
|
-
return to_enum unless block_given?
|
21
|
-
|
22
|
-
headers = type.attrlist
|
23
|
-
headers = headers[1..-1] if generate_row_num?
|
24
|
-
start_at = @options[:skip] + 2
|
25
|
-
end_at = spreadsheet.last_row
|
26
|
-
(start_at..end_at).each do |i|
|
27
|
-
row = spreadsheet.row(i)
|
28
|
-
init = init_tuple(i - start_at + 1)
|
29
|
-
tuple = (0...headers.size).each_with_object(init){|i,t|
|
30
|
-
t[headers[i]] = row[i]
|
31
|
-
}
|
32
|
-
yield(tuple)
|
33
|
-
end
|
34
|
-
end
|
35
|
-
|
36
|
-
def to_ast
|
37
|
-
[ :excel, @path, @options ]
|
38
|
-
end
|
39
|
-
|
40
|
-
def to_s
|
41
|
-
"(excel #{@path})"
|
42
|
-
end
|
43
|
-
alias :inspect :to_s
|
44
|
-
|
45
|
-
private
|
46
|
-
|
47
|
-
def spreadsheet
|
48
|
-
@spreadsheet ||= Roo::Spreadsheet
|
49
|
-
.open(@path, @options)
|
50
|
-
.sheet(@options[:sheet])
|
51
|
-
end
|
52
|
-
|
53
|
-
def infer_attrlist
|
54
|
-
row = spreadsheet.row(1+@options[:skip])
|
55
|
-
attrlist = row.map{|c| c.to_s.strip.to_sym }
|
56
|
-
attrlist.unshift(row_num_name) if generate_row_num?
|
57
|
-
attrlist
|
58
|
-
end
|
59
|
-
|
60
|
-
def generate_row_num?
|
61
|
-
!!@options[:row_num]
|
62
|
-
end
|
63
|
-
|
64
|
-
def row_num_name
|
65
|
-
case as = @options[:row_num]
|
66
|
-
when TrueClass then :row_num
|
67
|
-
when Symbol then as
|
68
|
-
else nil
|
69
|
-
end
|
70
|
-
end
|
71
|
-
|
72
|
-
def init_tuple(i)
|
73
|
-
return {} unless generate_row_num?
|
74
|
-
|
75
|
-
{ row_num_name => i }
|
76
|
-
end
|
77
|
-
|
78
|
-
end # class Excel
|
79
|
-
end # module Reader
|
80
|
-
end # module Bmg
|
1
|
+
require_relative 'xlsx'
|
@@ -0,0 +1,80 @@
|
|
1
|
+
module Bmg
|
2
|
+
module Reader
|
3
|
+
class Excel
|
4
|
+
include Reader
|
5
|
+
|
6
|
+
DEFAULT_OPTIONS = {
|
7
|
+
sheet: 0,
|
8
|
+
skip: 0,
|
9
|
+
row_num: true
|
10
|
+
}
|
11
|
+
|
12
|
+
def initialize(type, path, options = {})
|
13
|
+
require 'roo'
|
14
|
+
@path = path
|
15
|
+
@options = DEFAULT_OPTIONS.merge(options)
|
16
|
+
@type = type.knows_attrlist? ? type : type.with_attrlist(infer_attrlist)
|
17
|
+
end
|
18
|
+
|
19
|
+
def each
|
20
|
+
return to_enum unless block_given?
|
21
|
+
|
22
|
+
headers = type.attrlist
|
23
|
+
headers = headers[1..-1] if generate_row_num?
|
24
|
+
start_at = @options[:skip] + 2
|
25
|
+
end_at = spreadsheet.last_row
|
26
|
+
(start_at..end_at).each do |i|
|
27
|
+
row = spreadsheet.row(i)
|
28
|
+
init = init_tuple(i - start_at + 1)
|
29
|
+
tuple = (0...headers.size).each_with_object(init){|i,t|
|
30
|
+
t[headers[i]] = row[i]
|
31
|
+
}
|
32
|
+
yield(tuple)
|
33
|
+
end
|
34
|
+
end
|
35
|
+
|
36
|
+
def to_ast
|
37
|
+
[ :excel, @path, @options ]
|
38
|
+
end
|
39
|
+
|
40
|
+
def to_s
|
41
|
+
"(excel #{@path})"
|
42
|
+
end
|
43
|
+
alias :inspect :to_s
|
44
|
+
|
45
|
+
private
|
46
|
+
|
47
|
+
def spreadsheet
|
48
|
+
@spreadsheet ||= Roo::Spreadsheet
|
49
|
+
.open(@path, @options)
|
50
|
+
.sheet(@options[:sheet])
|
51
|
+
end
|
52
|
+
|
53
|
+
def infer_attrlist
|
54
|
+
row = spreadsheet.row(1+@options[:skip])
|
55
|
+
attrlist = row.map{|c| c.to_s.strip.to_sym }
|
56
|
+
attrlist.unshift(row_num_name) if generate_row_num?
|
57
|
+
attrlist
|
58
|
+
end
|
59
|
+
|
60
|
+
def generate_row_num?
|
61
|
+
!!@options[:row_num]
|
62
|
+
end
|
63
|
+
|
64
|
+
def row_num_name
|
65
|
+
case as = @options[:row_num]
|
66
|
+
when TrueClass then :row_num
|
67
|
+
when Symbol then as
|
68
|
+
else nil
|
69
|
+
end
|
70
|
+
end
|
71
|
+
|
72
|
+
def init_tuple(i)
|
73
|
+
return {} unless generate_row_num?
|
74
|
+
|
75
|
+
{ row_num_name => i }
|
76
|
+
end
|
77
|
+
|
78
|
+
end # class Excel
|
79
|
+
end # module Reader
|
80
|
+
end # module Bmg
|
data/lib/bmg/sequel.rb
CHANGED
@@ -0,0 +1,82 @@
|
|
1
|
+
module Bmg
|
2
|
+
class Summarizer
|
3
|
+
#
|
4
|
+
# Bucketizer summarizer.
|
5
|
+
#
|
6
|
+
# Example:
|
7
|
+
#
|
8
|
+
# # direct ruby usage
|
9
|
+
# Bmg::Summarizer.bucketize(:qty, :size => 2).summarize(...)
|
10
|
+
#
|
11
|
+
class Bucketize < Summarizer
|
12
|
+
|
13
|
+
# Sets default options.
|
14
|
+
def default_options
|
15
|
+
{ :size => 10 }
|
16
|
+
end
|
17
|
+
|
18
|
+
# Returns least value (defaults to "")
|
19
|
+
def least()
|
20
|
+
[[], []]
|
21
|
+
end
|
22
|
+
|
23
|
+
# Concatenates current memo with val.to_s
|
24
|
+
def _happens(memo, val)
|
25
|
+
memo.first << val
|
26
|
+
memo
|
27
|
+
end
|
28
|
+
|
29
|
+
# Finalizes computation
|
30
|
+
def finalize(memo)
|
31
|
+
buckets = compute_buckets(memo.first, options[:size])
|
32
|
+
buckets = touching_buckets(buckets) if options[:boundaries] == :touching
|
33
|
+
buckets
|
34
|
+
end
|
35
|
+
|
36
|
+
private
|
37
|
+
|
38
|
+
def compute_buckets(values, num_buckets = 10)
|
39
|
+
sorted_values = values.compact.sort
|
40
|
+
sorted_values = sorted_values.map{|v| v.to_s[0...options[:value_length]] } if options[:value_length]
|
41
|
+
sorted_values = sorted_values.uniq if options[:distinct]
|
42
|
+
|
43
|
+
# Calculate the size of each bucket
|
44
|
+
total_values = sorted_values.length
|
45
|
+
bucket_size = (total_values / num_buckets.to_f).ceil
|
46
|
+
|
47
|
+
# Create the ranges for each bucket
|
48
|
+
bucket_ranges = []
|
49
|
+
num_buckets.times do |i|
|
50
|
+
start_index = i * bucket_size
|
51
|
+
break if start_index >= total_values # Ensure we do not exceed the array bounds
|
52
|
+
|
53
|
+
end_index = [(start_index + bucket_size - 1), total_values - 1].min
|
54
|
+
start_value = sorted_values[start_index]
|
55
|
+
end_value = sorted_values[end_index]
|
56
|
+
bucket_ranges << (start_value..end_value)
|
57
|
+
end
|
58
|
+
|
59
|
+
bucket_ranges
|
60
|
+
end
|
61
|
+
|
62
|
+
def touching_buckets(buckets)
|
63
|
+
result = []
|
64
|
+
buckets.each do |b|
|
65
|
+
r_start = result.empty? ? b.begin : result.last.end
|
66
|
+
r_end = b.end
|
67
|
+
result << (r_start...r_end)
|
68
|
+
end
|
69
|
+
result[-1] = (result.last.begin..result.last.end)
|
70
|
+
|
71
|
+
result
|
72
|
+
end
|
73
|
+
|
74
|
+
end # class Concat
|
75
|
+
|
76
|
+
# Factors a bucketize summarizer
|
77
|
+
def self.bucketize(*args, &bl)
|
78
|
+
Bucketize.new(*args, &bl)
|
79
|
+
end
|
80
|
+
|
81
|
+
end # class Summarizer
|
82
|
+
end # module Bmg
|
data/lib/bmg/summarizer.rb
CHANGED
data/lib/bmg/version.rb
CHANGED
data/lib/bmg/writer/xlsx.rb
CHANGED
@@ -7,22 +7,36 @@ module Bmg
|
|
7
7
|
}
|
8
8
|
|
9
9
|
def initialize(xlsx_options, output_preferences = nil)
|
10
|
+
require 'write_xlsx'
|
10
11
|
@xlsx_options = DEFAULT_OPTIONS.merge(xlsx_options)
|
11
12
|
@output_preferences = OutputPreferences.dress(output_preferences)
|
12
13
|
end
|
13
14
|
attr_reader :xlsx_options, :output_preferences
|
14
15
|
|
15
16
|
def call(relation, path)
|
16
|
-
require 'write_xlsx'
|
17
17
|
dup._call(relation, path)
|
18
18
|
end
|
19
19
|
|
20
|
+
def self.to_xlsx(database, path)
|
21
|
+
require 'write_xlsx'
|
22
|
+
workbook = WriteXLSX.new(path)
|
23
|
+
database.each_relation_pair do |name, rel|
|
24
|
+
worksheet = workbook.add_worksheet(name)
|
25
|
+
rel.to_xlsx({
|
26
|
+
workbook: workbook,
|
27
|
+
worksheet: worksheet,
|
28
|
+
})
|
29
|
+
end
|
30
|
+
workbook.close
|
31
|
+
end
|
32
|
+
|
20
33
|
protected
|
21
34
|
attr_reader :workbook, :worksheet
|
22
35
|
|
23
36
|
def _call(relation, path)
|
24
37
|
@workbook = xlsx_options[:workbook] || WriteXLSX.new(path)
|
25
38
|
@worksheet = xlsx_options[:worksheet] || workbook.add_worksheet
|
39
|
+
@worksheet = workbook.add_worksheet(@worksheet) if @worksheet.is_a?(String)
|
26
40
|
|
27
41
|
headers = infer_headers(relation.type)
|
28
42
|
before = nil
|
data/lib/bmg/xlsx.rb
ADDED
data/lib/bmg.rb
CHANGED
@@ -24,6 +24,16 @@ module Bmg
|
|
24
24
|
end
|
25
25
|
module_function :csv
|
26
26
|
|
27
|
+
def json(path, options = {}, type = Type::ANY)
|
28
|
+
in_memory(path.load.map{|tuple| TupleAlgebra.symbolize_keys(tuple) })
|
29
|
+
end
|
30
|
+
module_function :json
|
31
|
+
|
32
|
+
def yaml(path, options = {}, type = Type::ANY)
|
33
|
+
in_memory(path.load.map{|tuple| TupleAlgebra.symbolize_keys(tuple) })
|
34
|
+
end
|
35
|
+
module_function :yaml
|
36
|
+
|
27
37
|
def excel(path, options = {}, type = Type::ANY)
|
28
38
|
Reader::Excel.new(type, path, options).spied(main_spy)
|
29
39
|
end
|
@@ -57,6 +67,8 @@ module Bmg
|
|
57
67
|
require_relative 'bmg/relation/materialized'
|
58
68
|
require_relative 'bmg/relation/proxy'
|
59
69
|
|
70
|
+
require_relative 'bmg/database'
|
71
|
+
|
60
72
|
# Deprecated
|
61
73
|
Leaf = Relation::InMemory
|
62
74
|
end
|
metadata
CHANGED
@@ -1,14 +1,14 @@
|
|
1
1
|
--- !ruby/object:Gem::Specification
|
2
2
|
name: bmg
|
3
3
|
version: !ruby/object:Gem::Version
|
4
|
-
version: 0.
|
4
|
+
version: 0.23.0
|
5
5
|
platform: ruby
|
6
6
|
authors:
|
7
7
|
- Bernard Lambeau
|
8
8
|
autorequire:
|
9
9
|
bindir: bin
|
10
10
|
cert_chain: []
|
11
|
-
date: 2024-
|
11
|
+
date: 2024-06-27 00:00:00.000000000 Z
|
12
12
|
dependencies:
|
13
13
|
- !ruby/object:Gem::Dependency
|
14
14
|
name: predicate
|
@@ -142,6 +142,10 @@ files:
|
|
142
142
|
- lib/bmg.rb
|
143
143
|
- lib/bmg/algebra.rb
|
144
144
|
- lib/bmg/algebra/shortcuts.rb
|
145
|
+
- lib/bmg/database.rb
|
146
|
+
- lib/bmg/database/data_folder.rb
|
147
|
+
- lib/bmg/database/sequel.rb
|
148
|
+
- lib/bmg/database/xlsx.rb
|
145
149
|
- lib/bmg/error.rb
|
146
150
|
- lib/bmg/operator.rb
|
147
151
|
- lib/bmg/operator/allbut.rb
|
@@ -172,6 +176,7 @@ files:
|
|
172
176
|
- lib/bmg/reader/csv.rb
|
173
177
|
- lib/bmg/reader/excel.rb
|
174
178
|
- lib/bmg/reader/text_file.rb
|
179
|
+
- lib/bmg/reader/xlsx.rb
|
175
180
|
- lib/bmg/relation.rb
|
176
181
|
- lib/bmg/relation/empty.rb
|
177
182
|
- lib/bmg/relation/in_memory.rb
|
@@ -272,6 +277,7 @@ files:
|
|
272
277
|
- lib/bmg/sql/version.rb
|
273
278
|
- lib/bmg/summarizer.rb
|
274
279
|
- lib/bmg/summarizer/avg.rb
|
280
|
+
- lib/bmg/summarizer/bucketize.rb
|
275
281
|
- lib/bmg/summarizer/by_proc.rb
|
276
282
|
- lib/bmg/summarizer/collect.rb
|
277
283
|
- lib/bmg/summarizer/concat.rb
|
@@ -300,6 +306,7 @@ files:
|
|
300
306
|
- lib/bmg/writer.rb
|
301
307
|
- lib/bmg/writer/csv.rb
|
302
308
|
- lib/bmg/writer/xlsx.rb
|
309
|
+
- lib/bmg/xlsx.rb
|
303
310
|
- tasks/gem.rake
|
304
311
|
- tasks/test.rake
|
305
312
|
homepage: http://github.com/enspirit/bmg
|